Can't post to jira because it is down or has high latency. I dislike the idea as well, but it is the most optimal case to write the vertices. Consider the Wikipedia linkset, 1gb of text data as adjacency list. With current trunk version it has at most 10gb. I have no clear check of how it is with that patch, but I assume that it will be less than 1gb. Suppose you have 64mb chunksize in HDFS, meaning 160 bsp tasks to be launched, as opposed to 16 for the most optimal case. I don't know if that's an argument for you. Compatibility to MapReduce shouldn't be our first aim, we can make a BSP job out of the random graph generator. However it might be a good thing to consider that giraph is supporting all inputformats and have a input key/value to vertex parser that runs when loading vertices. This would shift the responsibility to the user and we would remove Writability of the vertices, thus removing the VertexWritable classes.
If you have a good trade-off idea, let me know. 2012/5/24 Edward J. Yoon (JIRA) <[email protected]> > > [ > https://issues.apache.org/jira/browse/HAMA-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282244#comment-13282244] > > Edward J. Yoon commented on HAMA-580: > ------------------------------------- > > I dislike this idea. This makes programming complex and discourages use of > existing Mapper/Reducer e.g., Reducer, LongSumReducer, ... > > > Improve input of graph module > > ----------------------------- > > > > Key: HAMA-580 > > URL: https://issues.apache.org/jira/browse/HAMA-580 > > Project: Hama > > Issue Type: Improvement > > Components: graph > > Affects Versions: 0.5.0 > > Reporter: Thomas Jungblut > > Assignee: Thomas Jungblut > > Fix For: 0.5.0 > > > > Attachments: HAMA-580.patch, HAMA-580_1.patch > > > > > > Currently it is too verbose, the wikipedia dataset is going to be > bloated from 0.95gb to 5gb just because it is writing the classes x-times. > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA > administrators: > https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa > For more information on JIRA, see: http://www.atlassian.com/software/jira > > > -- Thomas Jungblut Berlin <[email protected]>
