> However it might be a good thing to consider that giraph is supporting all > inputformats and have a input key/value to vertex parser that runs when > loading vertices. > This would shift the responsibility to the user and we would remove > Writability of the vertices, thus removing the VertexWritable classes.
+1 On Thu, May 24, 2012 at 4:30 PM, Thomas Jungblut <[email protected]> wrote: > Can't post to jira because it is down or has high latency. > > I dislike the idea as well, but it is the most optimal case to write the > vertices. > Consider the Wikipedia linkset, 1gb of text data as adjacency list. > With current trunk version it has at most 10gb. > I have no clear check of how it is with that patch, but I assume that it > will be less than 1gb. > Suppose you have 64mb chunksize in HDFS, meaning 160 bsp tasks to be > launched, as opposed to 16 for the most optimal case. > I don't know if that's an argument for you. Compatibility to MapReduce > shouldn't be our first aim, we can make a BSP job out of the random graph > generator. > However it might be a good thing to consider that giraph is supporting all > inputformats and have a input key/value to vertex parser that runs when > loading vertices. > This would shift the responsibility to the user and we would remove > Writability of the vertices, thus removing the VertexWritable classes. > > If you have a good trade-off idea, let me know. > > > 2012/5/24 Edward J. Yoon (JIRA) <[email protected]> > >> >> [ >> https://issues.apache.org/jira/browse/HAMA-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282244#comment-13282244] >> >> Edward J. Yoon commented on HAMA-580: >> ------------------------------------- >> >> I dislike this idea. This makes programming complex and discourages use of >> existing Mapper/Reducer e.g., Reducer, LongSumReducer, ... >> >> > Improve input of graph module >> > ----------------------------- >> > >> > Key: HAMA-580 >> > URL: https://issues.apache.org/jira/browse/HAMA-580 >> > Project: Hama >> > Issue Type: Improvement >> > Components: graph >> > Affects Versions: 0.5.0 >> > Reporter: Thomas Jungblut >> > Assignee: Thomas Jungblut >> > Fix For: 0.5.0 >> > >> > Attachments: HAMA-580.patch, HAMA-580_1.patch >> > >> > >> > Currently it is too verbose, the wikipedia dataset is going to be >> bloated from 0.95gb to 5gb just because it is writing the classes x-times. >> >> -- >> This message is automatically generated by JIRA. >> If you think it was sent incorrectly, please contact your JIRA >> administrators: >> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa >> For more information on JIRA, see: http://www.atlassian.com/software/jira >> >> >> > > > -- > Thomas Jungblut > Berlin <[email protected]> -- Best Regards, Edward J. Yoon @eddieyoon
