Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.
The "HamaStreaming" page has been changed by thomasjungblut: http://wiki.apache.org/hama/HamaStreaming?action=diff&rev1=3&rev2=4 In any case you should now find a "HamaStreaming" folder in your Hama home directory which contains several scripts. + Now we have to upload these scripts to HDFS: + + {{{ + hadoop/bin/hadoop fs -mkdir /tmp/PyStreaming/ + hadoop/bin/hadoop fs -copyFromLocal HamaStreaming/* /tmp/PyStreaming/ + }}} + Let's start by executing the usual Hello World application that already ships with streaming: {{{ - bin/hama pipes -streaming true -bspTasks 2 -interpreter python3.2 -cachefiles HamaStreaming/*.py -output /tmp/pystream-out/ -program HamaStreaming/BSPRunner.py -programArgs HamaStreaming/HelloWorldBSP.py + bin/hama pipes -streaming true -bspTasks 2 -interpreter python3.2 -cachefiles /tmp/PyStreaming/*.py -output /tmp/pystream-out/ -program /tmp/PyStreaming/BSPRunner.py -programArgs HelloWorldBSP }}} + This will start 2 bsp tasks in streaming mode. In streaming a child process will be forked from the usual BSP Java task. In this case, this would yield to a new task that starts with python3.2, with the py files from HDFS. The noteworthy thing is actually, that you pass a runner class that takes care of all the protocol communication. Your user program is passed as the first program argument. + This works because python will start the runner py in a work directory from the cache files. So they are implicitly included and the whole computation can work, this is why you don't have to provide a path with the HelloWorldBSP (note the py is not needed, because of the reflective import). + + Hopefully you should see something along these lines: + + {{{ + 12/09/17 19:06:31 INFO pipes.Submitter: Streaming enabled! + 12/09/17 19:06:33 INFO bsp.BSPJobClient: Running job: job_201209171906_0001 + 12/09/17 19:06:40 INFO bsp.BSPJobClient: Job complete: job_201209171906_0001 + 12/09/17 19:06:40 INFO bsp.BSPJobClient: The total number of supersteps: 15 + 12/09/17 19:06:40 INFO bsp.BSPJobClient: Counters: 8 + 12/09/17 19:06:40 INFO bsp.BSPJobClient: org.apache.hama.bsp.JobInProgress$JobCounter + 12/09/17 19:06:40 INFO bsp.BSPJobClient: LAUNCHED_TASKS=2 + 12/09/17 19:06:40 INFO bsp.BSPJobClient: org.apache.hama.bsp.BSPPeerImpl$PeerCounter + 12/09/17 19:06:40 INFO bsp.BSPJobClient: SUPERSTEP_SUM=15 + 12/09/17 19:06:40 INFO bsp.BSPJobClient: COMPRESSED_BYTES_SENT=3310 + 12/09/17 19:06:40 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=2805 + 12/09/17 19:06:40 INFO bsp.BSPJobClient: COMPRESSED_BYTES_RECEIVED=3310 + 12/09/17 19:06:40 INFO bsp.BSPJobClient: TOTAL_MESSAGES_SENT=60 + 12/09/17 19:06:40 INFO bsp.BSPJobClient: TOTAL_MESSAGES_RECEIVED=30 + 12/09/17 19:06:40 INFO bsp.BSPJobClient: TASK_OUTPUT_RECORDS=28 + + }}} + + And now you can view the output of your job with: + + {{{ + hadoop/bin/hadoop fs -cat /tmp/pystream-out/part-00001 + }}} + + in my case this looks like this: + + {{{ + Hello from localhost:61002 in superstep 0 + Hello from localhost:61001 in superstep 0 + Hello from localhost:61001 in superstep 1 + Hello from localhost:61002 in superstep 1 + [...] + Hello from localhost:61001 in superstep 14 + Hello from localhost:61002 in superstep 14 + }}} +
