Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change 
notification.

The "HamaStreaming" page has been changed by thomasjungblut:
http://wiki.apache.org/hama/HamaStreaming?action=diff&rev1=3&rev2=4

  
  In any case you should now find a "HamaStreaming" folder in your Hama home 
directory which contains several scripts.
  
+ Now we have to upload these scripts to HDFS:
+ 
+ {{{
+ hadoop/bin/hadoop fs -mkdir /tmp/PyStreaming/
+ hadoop/bin/hadoop fs -copyFromLocal HamaStreaming/* /tmp/PyStreaming/
+ }}}
+ 
  Let's start by executing the usual Hello World application that already ships 
with streaming:
  
  {{{
- bin/hama pipes -streaming true -bspTasks 2 -interpreter python3.2 -cachefiles 
HamaStreaming/*.py -output /tmp/pystream-out/ -program 
HamaStreaming/BSPRunner.py -programArgs HamaStreaming/HelloWorldBSP.py
+ bin/hama pipes -streaming true -bspTasks 2 -interpreter python3.2 -cachefiles 
/tmp/PyStreaming/*.py -output /tmp/pystream-out/ -program 
/tmp/PyStreaming/BSPRunner.py -programArgs HelloWorldBSP
  }}}
  
+ This will start 2 bsp tasks in streaming mode. In streaming a child process 
will be forked from the usual BSP Java task. In this case, this would yield to 
a new task that starts with python3.2, with the py files from HDFS. The 
noteworthy thing is actually, that you pass a runner class that takes care of 
all the protocol communication. Your user program is passed as the first 
program argument.
+ This works because python will start the runner py in a work directory from 
the cache files. So they are implicitly included and the whole computation can 
work, this is why you don't have to provide a path with the HelloWorldBSP (note 
the py is not needed, because of the reflective import).
+ 
+ Hopefully you should see something along these lines:
+ 
+ {{{
+ 12/09/17 19:06:31 INFO pipes.Submitter: Streaming enabled!
+ 12/09/17 19:06:33 INFO bsp.BSPJobClient: Running job: job_201209171906_0001
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient: Job complete: job_201209171906_0001
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient: The total number of supersteps: 15
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient: Counters: 8
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient:   
org.apache.hama.bsp.JobInProgress$JobCounter
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient:     LAUNCHED_TASKS=2
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient:   
org.apache.hama.bsp.BSPPeerImpl$PeerCounter
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient:     SUPERSTEP_SUM=15
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient:     COMPRESSED_BYTES_SENT=3310
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient:     TIME_IN_SYNC_MS=2805
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient:     COMPRESSED_BYTES_RECEIVED=3310
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient:     TOTAL_MESSAGES_SENT=60
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient:     TOTAL_MESSAGES_RECEIVED=30
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient:     TASK_OUTPUT_RECORDS=28
+ 
+ }}}
+ 
+ And now you can view the output of your job with:
+ 
+ {{{
+ hadoop/bin/hadoop fs -cat /tmp/pystream-out/part-00001
+ }}}
+ 
+ in my case this looks like this:
+ 
+ {{{
+ Hello from localhost:61002 in superstep 0     
+ Hello from localhost:61001 in superstep 0     
+ Hello from localhost:61001 in superstep 1
+ Hello from localhost:61002 in superstep 1
+ [...]
+ Hello from localhost:61001 in superstep 14    
+ Hello from localhost:61002 in superstep 14    
+ }}}
+ 

Reply via email to