[
https://issues.apache.org/jira/browse/HAMA-815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828158#comment-13828158
]
Martin Illecker commented on HAMA-815:
--------------------------------------
Thanks for testing!
If no-one objects within the next few days, I'll assume lazy consensus and
commit it at the end of the week.
> Hama Pipes uses C++ templates
> -----------------------------
>
> Key: HAMA-815
> URL: https://issues.apache.org/jira/browse/HAMA-815
> Project: Hama
> Issue Type: New Feature
> Components: bsp core, pipes
> Affects Versions: 0.6.3
> Reporter: Martin Illecker
> Assignee: Martin Illecker
> Fix For: 0.7.0
>
> Attachments: HAMA-815.patch
>
>
> *Extending Hama Pipes to use C++ templates*
> Currently all messages are converted to *strings* before they are transferred
> over a socket communication between C++ and Java and vice versa.
> To take advantage of the binary socket communication we will serialize and
> deserialize basic types like *int, long, float, double* directly without
> converting to strings. This will minimize the risk of type conversation
> errors. Other types (except these basic types) are transferred as strings.
> It's also possible to create custom *Writables* and serialize and deserialize
> the object to string by overriding the following methods. (e.g.,
> *PipesVectorWritable* and *PipesKeyValueWritable*)
> {code}
> @Override public void readFields(DataInput in) throws IOException
> @Override public void write(DataOutput out) throws IOException
> {code}
> Hama Streaming, which depends on Hama Pipes, is still using strings.
> The following methods change
> {{virtual void sendMessage(const string& peerName, const string& msg)}}
> {{virtual const string& getCurrentMessage()}}
> {{virtual void write(const string& key, const string& value)}}
> {{virtual bool readNext(string& key, string& value)}}
> to support C++ templates:
> {{virtual void sendMessage(const string& peer_name, *const M& msg*)}}
> {{virtual *M* getCurrentMessage()}}
> {{virtual void write(*const K2& key, const V2& value*)}}
> {{virtual bool readNext(*K1& key, V1& value*)}}
> Also *SequenceFile* functions uses templates:
> {{bool sequenceFileReadNext(int32_t file_id, *K& key, V& value*)}}
> {{bool sequenceFileAppend(int32_t file_id, *const K& key, const V& value*)}}
> And the native *Partitioner* supports it:
> {code}
> template<class K1, class V1, class K2, class V2, class M>
> class Partitioner {
> public:
> virtual int partition(const K1& key, const V1& value, int32_t num_tasks)
> = 0;
> virtual ~Partitioner() {}
> };
> {code}
> This will minimize type conversation errors and change the compilation
> procedure. Because of the nature of C++ templates, static libraries are not
> possible anymore. The compiler will substitute all templates at compile time.
> The compile command will look like:
> {code}
> g++ -m64 -Ic++/src/main/native/utils/api \
> -Ic++/src/main/native/pipes/api \
> -Lc++/target/native \
> -lhadooputils -lpthread \
> PROGRAM.cc \
> -o PROGRAM \
> -g -Wall -O2
> {code}
> Finally the job configuration supports the following properties:
> {code}
> <property>
> <name>bsp.input.format.class</name>
> <value>org.apache.hama.bsp.KeyValueTextInputFormat</value>
> </property>
> <property>
> <name>bsp.input.key.class</name>
> <value>org.apache.hadoop.io.Text</value>
> </property>
> <property>
> <name>bsp.input.value.class</name>
> <value>org.apache.hadoop.io.Text</value>
> </property>
> <property>
> <name>bsp.output.format.class</name>
> <value>org.apache.hama.bsp.SequenceFileOutputFormat</value>
> </property>
> <property>
> <name>bsp.output.key.class</name>
> <value>org.apache.hadoop.io.Text</value>
> </property>
> <property>
> <name>bsp.output.value.class</name>
> <value>org.apache.hadoop.io.DoubleWritable</value>
> </property>
> <property>
> <name>bsp.message.class</name>
> <value>org.apache.hadoop.io.DoubleWritable</value>
> </property>
> {code}
--
This message was sent by Atlassian JIRA
(v6.1#6144)