Re: Some questions about execution workflow of HamaGraph.

顾荣 Wed, 19 Sep 2012 08:43:19 -0700

Haha, I got it. It makes sense. This all happens in the loadVertices()
function of class GraphJobRunner. It is at the setup stage on each task.
The few supersteps of a job are spent on partitioning and assigning
vertices among tasks.


Thanks for expalining this patiently to me, Thomas and Yuesheng.

Regards.
Walker

2012/9/19 Thomas Jungblut <[email protected]>

> To be more detailed, it will be done during runtime in every of the tasks.
> Each task gets its block from HDFS and then starts partitioning.
>
> 2012/9/19 Thomas Jungblut <[email protected]>
>
> > Like Yuesheng Hu already mentioned in the GraphJobRunner
> > method loadVertices in the setup stage.
> >
> >
> > 2012/9/19 顾荣 <[email protected]>
> >
> >> Sorry, I sent last mail by mistake, it's unfinished in the last mail.
> >>
> >> Hi Thomas,
> >>
> >> I just read this part of code in the *submitJobInternal*() function of
> >> *org.apache.hama.bsp.BSPJobClient.
> >> *As you mentioned.raw BSPs have the opportunity to partition before the
> >> job,
> >> *// Create the splits for the job
> >>       LOG.debug("Creating splits at " +
> >> fs.makeQualified(submitSplitFile));
> >>       if (job.getConf().get("bsp.input.partitioner.class") != null
> >>           && !job.getConf()
> >>               .getBoolean("hama.graph.runtime.partitioning", false)) {
> >>         job = partition(job, maxTasks);
> >>         maxTasks = job.getInt("hama.partition.count", maxTasks);
> >>       }*
> >>
> >> By the way, if I do not partition the file on submitting stage. When and
> >> where will the vertex in the file will be partitioned and assigned to
> each
> >> task in Hama Graph? On the master node before running？ Or on each groom
> >> server at the first superstep?
> >>
> >> Thanks
> >>
> >> Walker.
> >>
> >>
> >>
> >> 2012/9/19 顾荣 <[email protected]>
> >>
> >> > Hi Thomas,
> >> >
> >> > I just read this part of code in the *submitJobInternal*() function of
> >> *org.apache.hama.bsp.BSPJobClient.
> >> > *As you mentioned.raw BSPs have the opportunity to partition before
> the
> >> > job,
> >> > *// Create the splits for the job
> >> >       LOG.debug("Creating splits at " +
> >> fs.makeQualified(submitSplitFile));
> >> >       if (job.getConf().get("bsp.input.partitioner.class") != null
> >> >           && !job.getConf()
> >> >               .getBoolean("hama.graph.runtime.partitioning", false)) {
> >> >         job = partition(job, maxTasks);
> >> >         maxTasks = job.getInt("hama.partition.count", maxTasks);
> >> >       }*
> >> >
> >> >
> >> > 2012/9/19 Thomas Jungblut <[email protected]>
> >> >
> >> >> Hey,
> >> >>
> >> >> the file is getting split like Hadoop does it, defined by the
> >> inputformat.
> >> >> It will be partitioned during runtime, raw BSPs have the opportunity
> to
> >> >> partition before the job, but this is not soo scalable so we have not
> >> done
> >> >> this in graph algorithms. There is no load balancing besides the
> usual
> >> >> hash
> >> >> partitioning. However you can write your own partitioner to
> distribute
> >> the
> >> >> vertices, we are going to provide work stealing in the future so the
> >> load
> >> >> balancing gets better.
> >> >>
> >> >>
> >> >> 2012/9/19 Yuesheng Hu <[email protected]>
> >> >>
> >> >> > org.apache.hama.graph.GraphJobRunner is the most important class in
> >> >> should
> >> >> > read, also  other classes in org.apache.hama.graph
> >> >> >
> >> >> >
> >> >> > 2012/9/19 顾荣 <[email protected]>
> >> >> >
> >> >> > > Hi All,I have some questions about your design in HamaGraph. Let
> me
> >> >> take
> >> >> > > the PageRank example to illustrate my questions.
> >> >> > >
> >> >> > > I have 3 Groom Servers each with 3 free BSP task nodes in my Hama
> >> >> > > cluster.The input file is as blow.
> >> >> > >
> >> >> > > "stackoverflow.com    yahoo.com
> >> >> > > facebook.com    twitter.com    google.com    nasa.gov
> >> >> > > yahoo.com    nasa.gov    stackoverflow.com
> >> >> > > twitter.com    google.com    facebook.com
> >> >> > > nasa.gov    yahoo.com    stackoverflow.com
> >> >> > > youtube.com    google.com    yahoo.com
> >> >> > > "
> >> >> > > In this case, there are 6 vertexs. How do you assign them among
> >> these
> >> >> > task
> >> >> > > nodes? Can it guarantee load balancing? And, Do you support a
> >> >> function to
> >> >> > > supply to customize their own vertex assignment policy? I am so
> >> >> confused
> >> >> > > with the tasks split part of Hama, it seems the same as Hadoop
> (by
> >> >> input
> >> >> > > splits) from its source code, but it works different. And does
> the
> >> >> task
> >> >> > > split part of HamaBSP is the same as HamaGraph?
> >> >> > >
> >> >> > > Would you please give some info about that? If you are busy to
> >> answer
> >> >> my
> >> >> > > questions, please kindly point it out to me that in which classes
> >> or
> >> >> > > functions of source code you implemented what I am confused
> about,
> >> I
> >> >> > think
> >> >> > > I read it more myself.
> >> >> > >
> >> >> > > Anyway,Thanks again.
> >> >> > >
> >> >> > > Walker
> >> >> > >
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: Some questions about execution workflow of HamaGraph.

Reply via email to