Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.
The "StreamingProtocol" page has been changed by thomasjungblut: http://wiki.apache.org/hama/StreamingProtocol?action=diff&rev1=3&rev2=4 [[http://people.apache.org/~tjungblut/downloads/streaming_documentation_0.pdf|Immutable Version *.pdf]] - >> will clean this up tomorrow. - == Abstract == The Hama Streaming Protocol is a protocol which enables non-Java applications to write BSP functions just like in pure Java code. The communication takes place with plain strings and is therefore highly portable. Also this is the only difference to Hama Pipes, which is a native implementation for C/C++ fully based on low level binary communications and is therefore faster and has less overhead. In the following chapters, we will introduce the text based protocol and its bidirectional communication, also with the practical example of a Python implementation. @@ -27, +25 @@ Hama now redirects the input and output streams of that child process to streams that are used for the protocol. Therefore you can’t simply print by writing to STDOUT, but you can use the LOG function that redirects special log statements to the log of the Java task, this will be explained later. The protocol is designed to mimic the Java API, for obvious reasons of documentation and convenience, so we suggest to make yourself comfortable with the Java API first and then get into coding the protocol. For additional information about the design and programming model, have a look at our Getting Started section in our wiki and the user documentation as a PDF. - http://wiki.apache.org/hama/GettingStarted#User_documentation_as_PDF + + [[GettingStarted#User_documentation_as_PDF| Getting Started - User Documentation as PDF]] If you want to know how to run a user code within other environments, have a look in the python example at the end. @@ -42, +41 @@ Here is a table of all op codes and their corresponding unique identifier, note that not all of them are in use, because the streaming protocol is working on top of the binary pipes protocol. - Operation Code Name Operation Code identifier Comment + ||Operation Code Name ||Operation Code identifier|| Comment|| - START 0 First op code after fork + ||START ||0|| First op code after fork|| - SET_BSP_JOB_CONF 1 Get configuration values + ||SET_BSP_JOB_CONF ||1|| Get configuration values|| - SET_INPUT_TYPES 2 Not used + ||SET_INPUT_TYPES ||2|| Not used|| - RUN_SETUP 3 Start of the setup function + ||RUN_SETUP ||3|| Start of the setup function|| - RUN_BSP 4 Start of the bsp function + ||RUN_BSP ||4|| Start of the bsp function|| - RUN_CLEANUP 5 Start of the cleanup function + ||RUN_CLEANUP ||5|| Start of the cleanup function|| - READ_KEYVALUE 6 Reads a key/value pair from input (Text Only) + ||READ_KEYVALUE ||6|| Reads a key/value pair from input (Text Only)|| - WRITE_KEYVALUE 7 Writes a key/value pair to output (Text Only) + ||WRITE_KEYVALUE ||7|| Writes a key/value pair to output (Text Only)|| - GET_MSG 8 Gets the next message in the queue + ||GET_MSG ||8|| Gets the next message in the queue|| - GET_MSG_COUNT 9 Get how many messages are in the queue + ||GET_MSG_COUNT ||9|| Get how many messages are in the queue|| - SEND_MSG 10 Send a message + ||SEND_MSG ||10|| Send a message|| - SYNC 11 Start the barrier synchronization + ||SYNC ||11|| Start the barrier synchronization|| - GET_ALL_PEERNAME 12 Get all peer names + ||GET_ALL_PEERNAME ||12|| Get all peer names|| - GET_PEERNAME 13 Get the peer name of the current peer + ||GET_PEERNAME ||13|| Get the peer name of the current peer|| - GET_PEER_INDEX 14 Get a peer name via index + ||GET_PEER_INDEX ||14|| Get a peer name via index|| - GET_PEER_COUNT 15 Get how many peers are there + ||GET_PEER_COUNT ||15|| Get how many peers are there|| - GET_SUPERSTEP_COUNT 16 Get the current Superstep counter + ||GET_SUPERSTEP_COUNT ||16|| Get the current Superstep counter|| - REOPEN_INPUT 17 Reopens the input to reread + ||REOPEN_INPUT ||17|| Reopens the input to reread|| - CLEAR 18 Clears the messaging queue + ||CLEAR ||18|| Clears the messaging queue|| - CLOSE 19 Closes the protocol + ||CLOSE ||19|| Closes the protocol|| - ABORT 20 Not used + ||ABORT ||20|| Not used|| - DONE 21 Closes the protocol if task is done + ||DONE ||21|| Closes the protocol if task is done|| - TASK_DONE 22 Yet another task is done op code + ||TASK_DONE ||22|| Yet another task is done op code|| - REGISTER_COUNTER 23 Not used (please create a new issue for that) + ||REGISTER_COUNTER ||23|| Not used (please create a new issue for that)|| - INCREMENT_COUNTER 24 Not used (please create a new issue for that) + ||INCREMENT_COUNTER ||24|| Not used (please create a new issue for that)|| - LOG 25 Not implemented in Pipes, but in streaming it sends child logging to the Java task. + ||LOG ||25|| Not implemented in Pipes, but in streaming it sends child logging to the Java task.|| == Acknowledgements == @@ -89, +88 @@ A sample communication could look like this // are comments and are not parts of the protocol: - + {{{ %0%= // start op code from java task 0 // protocol number %1%= // set bsp configuration op code 2 / number of configuration items to read hama.bsp.child.opts // sample key from configuration -Xmx512m // sample value from configuration + }}} Now the forked child needs to acknowledge the start OP code by writing: + {{{ %ACK_0%= // ACK’d start code + }}} After the acknowledgement, we immediately start with the setup function. + + - Setup/BSP/Cleanup Sequence + == Setup/BSP/Cleanup Sequence == A normal BSP task has three major steps: @@ -118, +122 @@ Of course here is the example with the setup function: + {{{ %3%= // setup op code from java task // now call the users setup function, any communication can happen here %ACK_3%= // ACK’d setup code + }}} The three steps are called in sequential order, so after you have ACK’d the end of the setup, the Java code will immediately start telling you about that you need to start the bsp function. This applies also for the transition between bsp and cleanup function. @@ -130, +136 @@ Here is the table for the communication of the currently implemented BSP functionality: - BSP Primitive Operation sequence Response + ||BSP Primitive ||Operation sequence|| Response|| + ||Send || %10%= \n Destination peer name \n Message as text |||| + ||Sync ||%11%=|| After sync a special ACK will be in the next line: “%11%=_SUCCESS”. So you should block until you received this.|| - Send - %10%= - - Destination peer name - - Message as text - Sync - %11%= After sync a special ACK will be in the next line: “%11%=_SUCCESS” - So you should block until you received this. - getCurrentMessage - %8%= “%%-1%%” if no message was found, else the message as text + ||getCurrentMessage ||%8%=|| “%%-1%%” if no message was found, else the message as text|| - getSuperstepCount - %16%= The raw Superstep number + ||getSuperstepCount ||%16%=|| The raw Superstep number|| - getMessageCount - %9%= The raw message count + ||getMessageCount ||%9%=|| The raw message count|| - getPeerName - %13%= - - Followed by an index (plain integer), -1 for the name of this peer. The name of the peer as text + ||getPeerName ||%13%= \n Followed by an index (plain integer), -1 for the name of this peer. ||The name of the peer as text|| - getPeerIndex - %14%= The raw index in the peer array + ||getPeerIndex ||%14%=|| The raw index in the peer array|| - getAllPeerNames - %12%= A line how many peers are there (plain number), then n-lines with the peer names + ||getAllPeerNames ||%12%=|| A line how many peers are there (plain number), then n-lines with the peer names|| - getPeerCount - %15%= Plain number of how many peers are there + ||getPeerCount ||%15%=|| Plain number of how many peers are there|| + ||writeKeyValue ||%7%= \n Key as line \n Value as line |||| - writeKeyValue - %7%= - - Key as line - - Value as line - readKeyValue - %6%= Key and Value in two separate following lines. If no input available, both are equal to “%%-1%%”. + ||readKeyValue ||%6%=|| Key and Value in two separate following lines. If no input available, both are equal to “%%-1%%”.|| - reopenInput - %17%= + ||reopenInput||%17%= |||| == Closing Sequences == @@ -157, +157 @@ After receiving this, the Java process will do the normal cleanup and finish the task itself, after sending these codes you can gracefully shutdown your process by letting it exit with a zero status. The end sequence looks like this: - + {{{ %22%= // task done %21%= // completely done - + }}} Congratulation, you are now able to implement Hama streaming in other languages. @@ -178, +178 @@ Looks basically like that: + {{{ className = sys.argv[1] module = __import__(className) class_ = getattr(module, className) bspInstance = class_() + }}} Now you can pass this instance to the handler that calls back the user functions in the python implementation this is done by the peer itself.
