[ 
https://issues.apache.org/jira/browse/HAMA-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019224#comment-13019224
 ] 

Thomas Jungblut commented on HAMA-258:
--------------------------------------

What is the actual difference between these two functions? 
IMHO the compute() is called for each vertex, whereas we are using a bsp() for 
a task.

We should provide a Reader class that reads SequenceFiles/TextFiles and HBase 
tables.

How should we do the partitioning?
* Block partitioning like Hadoop -  this is not very flexible, depends on 
locality of the data
* Key partitioning (like in my blog, you can just send the message to the groom 
that contains this vertexID) // this would be better for HBase input, or for 
SequenceFiles.

Or like the last two, just with messaging, but this would be slower than 
writing it into a HDFS block.

How about a simple outputsystem?
We can provide an output collector in the BSPPeer and each peer has it's own 
outputfile in HDFS. 
If a user wants to output, he simply can and don't have to code a SequenceFile 
writer.

> Design a input and output system
> --------------------------------
>
>                 Key: HAMA-258
>                 URL: https://issues.apache.org/jira/browse/HAMA-258
>             Project: Hama
>          Issue Type: New Feature
>          Components: bsp
>    Affects Versions: 0.3.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.3.0
>
>
> This issue will handle the input and output system with data splitter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to