[
https://issues.apache.org/jira/browse/HAMA-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019224#comment-13019224
]
Thomas Jungblut commented on HAMA-258:
--------------------------------------
What is the actual difference between these two functions?
IMHO the compute() is called for each vertex, whereas we are using a bsp() for
a task.
We should provide a Reader class that reads SequenceFiles/TextFiles and HBase
tables.
How should we do the partitioning?
* Block partitioning like Hadoop - this is not very flexible, depends on
locality of the data
* Key partitioning (like in my blog, you can just send the message to the groom
that contains this vertexID) // this would be better for HBase input, or for
SequenceFiles.
Or like the last two, just with messaging, but this would be slower than
writing it into a HDFS block.
How about a simple outputsystem?
We can provide an output collector in the BSPPeer and each peer has it's own
outputfile in HDFS.
If a user wants to output, he simply can and don't have to code a SequenceFile
writer.
> Design a input and output system
> --------------------------------
>
> Key: HAMA-258
> URL: https://issues.apache.org/jira/browse/HAMA-258
> Project: Hama
> Issue Type: New Feature
> Components: bsp
> Affects Versions: 0.3.0
> Reporter: Edward J. Yoon
> Assignee: Edward J. Yoon
> Fix For: 0.3.0
>
>
> This issue will handle the input and output system with data splitter.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira