[jira] [Commented] (HAMA-700) BSPPartitioner should be configurable to be optional and allow input format conversion

Suraj Menon (JIRA) Mon, 14 Jan 2013 21:20:23 -0800

    [ 
https://issues.apache.org/jira/browse/HAMA-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553497#comment-13553497
 ]


Suraj Menon commented on HAMA-700:
----------------------------------

> 1)Why did you choose to introduce getPartitionId() method in 
> PartitioningRunner.
Two reasons. 
- Partitioner is used in the partitioning job as well as the user submitted 
job. Depending on the RecordConverter class, the input records for partitioner 
job is different than the 
type of input records for user job. 
- In future, when we have scalable messaging and better scheduler, instead of 
starting a new partitioner job, we can inject a partitioning superstep. For 
HAMA-561, when we have the partitions to be not changed but only converted to 
Vertex, the partition id would be same as the peer index. 

> 2) The goals of the patch :
- to provide means in bsp core that could be reused in graph module to do 
run-time partitioning
- to make the graph job independent of the user data input 
format.(TextInputFormat, SequenceFileFormat ...)
I am sorry, but I am a little lost on the suggestion. We chose to implement 
run-time partitioning in the partition runner because eventually they are both 
doing the same. 
I am already guilty of doubling the storage of vertices. We can consider 
intermediate stages (that writes local files instead of HDFS) when we implement 
BSPPartitioner injected into the execution in terms of task count specified for 
the job.

> 3) It is up for vote. Vertex.write and readFields uses it , we can use the 
> vertex.runner.conf

> 4) Sure, I just used it from the previous version. if you make it generic, 
> then you have to specify the classes in the configuration of the job.

> 5) Used when the user wants to just convert the records but run with number 
> of tasks same as count of splits.

The direction we have taken in partitioning is further open for suggestions and 
vote. The direct commit was because it was difficult for me to keep up the 
patch with commits on the same issue. From the next patch I would be following 
upload patch, review and then commit. Thanks for reviewing.
                
> BSPPartitioner should be configurable to be optional and allow input format 
> conversion
> --------------------------------------------------------------------------------------
>
>                 Key: HAMA-700
>                 URL: https://issues.apache.org/jira/browse/HAMA-700
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp core
>            Reporter: Suraj Menon
>            Assignee: Suraj Menon
>             Fix For: 0.6.1
>
>         Attachments: HAMA-700.patch_Jan7, HAMA-700.patch.v2, HAMA-700-v1.patch
>
>
> There should be a provisioning for skipping the PartitionRunner if needed. 
> Also we can have a RecordConverter interface so that the PartitionRunner can 
> read the input in any format and create new splits. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HAMA-700) BSPPartitioner should be configurable to be optional and allow input format conversion

Reply via email to