Re: [jira] [Commented] (HAMA-642) Make GraphRunner disk based

Thomas Jungblut Sat, 29 Sep 2012 05:09:38 -0700

It always fails if the input is large?^^
Do you have stacktraces? If there are filesystem problems, this isn't
unexpected... Maybe a disk filled up.


2012/9/29 Edward J. Yoon <[email protected]>

> > - Does this fail always or just sometimes?
>
> Always
> > - When it finishes, is the result wrong? Just curios, how do you compare
> 20gb of text files?;D
>
> Never finishes.
>
> > - In case it is really the combiner, does pagerank work without problems?
>
> Never finishes if input is large.
>
> Sent from my iPad
>
> On Sep 29, 2012, at 5:07 AM, "Thomas Jungblut (JIRA)" <[email protected]>
> wrote:
>
> >
> >    [
> https://issues.apache.org/jira/browse/HAMA-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465866#comment-13465866]
> >
> > Thomas Jungblut commented on HAMA-642:
> > --------------------------------------
> >
> > A race is not good. We have to investigate a bit deeper I guess. I don't
> think that there is a concurrency problem inside of jdbm, but I will have a
> look, maybe there is some resources that is static, however each task has
> its own mutal exclusive "database". so I don't see a problem there.
> >
> > My first guess was the use of the combiner. So here my questions:
> > - Does this fail always or just sometimes?
> > - When it finishes, is the result wrong? Just curios, how do you compare
> 20gb of text files?;D
> > - In case it is really the combiner, does pagerank work without problems?
> >
> > I will build a smaller cluster in near future to test these things more
> efficiently.
> >
> >> Make GraphRunner disk based
> >> ---------------------------
> >>
> >>                Key: HAMA-642
> >>                URL: https://issues.apache.org/jira/browse/HAMA-642
> >>            Project: Hama
> >>         Issue Type: Improvement
> >>         Components: graph
> >>   Affects Versions: 0.5.0
> >>           Reporter: Thomas Jungblut
> >>           Assignee: Edward J. Yoon
> >>        Attachments: HAMA-642_unix_1.patch, HAMA-642_unix_2.patch,
> HAMA-scale_1.patch, HAMA-scale_2.patch, HAMA-scale_3.patch,
> HAMA-scale_4.patch
> >>
> >>
> >> To improve scalability we can improve the graph runner to be disk based.
> >> Which basically means:
> >> - We have just a single Vertex instance that get's refilled.
> >> - We directly write vertices to disk after partitioning
> >> - In every superstep we iterate over the vertices on disk, fill the
> vertex instance and call the users compute functions
> >> Problems:
> >> - State other than vertex value can't be stored easy
> >> - How do we deal with random access after messages have arrived?
> >> So I think we should make the graph runner more hybrid, like using the
> queues we have implemented in the messaging. So the graphrunner can be
> configured to run completely on disk, in cached mode or in in-memory mode.
> >
> > --
> > This message is automatically generated by JIRA.
> > If you think it was sent incorrectly, please contact your JIRA
> administrators
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
>

Re: [jira] [Commented] (HAMA-642) Make GraphRunner disk based

Reply via email to