Re: [jira] [Commented] (HAMA-580) Improve input of graph module

Thomas Jungblut Thu, 24 May 2012 02:09:26 -0700

Great you share that, so let's do it like this.

2012/5/24 Edward J. Yoon <[email protected]>


> > However it might be a good thing to consider that giraph is supporting
> all
> > inputformats and have a input key/value to vertex parser that runs when
> > loading vertices.
> > This would shift the responsibility to the user and we would remove
> > Writability of the vertices, thus removing the VertexWritable classes.
>
> +1
>
> On Thu, May 24, 2012 at 4:30 PM, Thomas Jungblut
> <[email protected]> wrote:
> > Can't post to jira because it is down or has high latency.
> >
> > I dislike the idea as well, but it is the most optimal case to write the
> > vertices.
> > Consider the Wikipedia linkset, 1gb of text data as adjacency list.
> > With current trunk version it has at most 10gb.
> > I have no clear check of how it is with that patch, but I assume that it
> > will be less than 1gb.
> > Suppose you have 64mb chunksize in HDFS, meaning 160 bsp tasks to be
> > launched, as opposed to 16 for the most optimal case.
> > I don't know if that's an argument for you. Compatibility to MapReduce
> > shouldn't be our first aim, we can make a BSP job out of the random graph
> > generator.
> > However it might be a good thing to consider that giraph is supporting
> all
> > inputformats and have a input key/value to vertex parser that runs when
> > loading vertices.
> > This would shift the responsibility to the user and we would remove
> > Writability of the vertices, thus removing the VertexWritable classes.
> >
> > If you have a good trade-off idea, let me know.
> >
> >
> > 2012/5/24 Edward J. Yoon (JIRA) <[email protected]>
> >
> >>
> >>    [
> >>
> https://issues.apache.org/jira/browse/HAMA-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282244#comment-13282244
> ]
> >>
> >> Edward J. Yoon commented on HAMA-580:
> >> -------------------------------------
> >>
> >> I dislike this idea. This makes programming complex and discourages use
> of
> >> existing Mapper/Reducer e.g., Reducer, LongSumReducer, ...
> >>
> >> > Improve input of graph module
> >> > -----------------------------
> >> >
> >> >                 Key: HAMA-580
> >> >                 URL: https://issues.apache.org/jira/browse/HAMA-580
> >> >             Project: Hama
> >> >          Issue Type: Improvement
> >> >          Components: graph
> >> >    Affects Versions: 0.5.0
> >> >            Reporter: Thomas Jungblut
> >> >            Assignee: Thomas Jungblut
> >> >             Fix For: 0.5.0
> >> >
> >> >         Attachments: HAMA-580.patch, HAMA-580_1.patch
> >> >
> >> >
> >> > Currently it is too verbose, the wikipedia dataset is going to be
> >> bloated from 0.95gb to 5gb just because it is writing the classes
> x-times.
> >>
> >> --
> >> This message is automatically generated by JIRA.
> >> If you think it was sent incorrectly, please contact your JIRA
> >> administrators:
> >>
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> >> For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >>
> >>
> >>
> >
> >
> > --
> > Thomas Jungblut
> > Berlin <[email protected]>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>



-- 
Thomas Jungblut
Berlin <[email protected]>

Re: [jira] [Commented] (HAMA-580) Improve input of graph module

Reply via email to