Haha, I got it. It makes sense. This all happens in the loadVertices() function of class GraphJobRunner. It is at the setup stage on each task. The few supersteps of a job are spent on partitioning and assigning vertices among tasks.
Thanks for expalining this patiently to me, Thomas and Yuesheng. Regards. Walker 2012/9/19 Thomas Jungblut <[email protected]> > To be more detailed, it will be done during runtime in every of the tasks. > Each task gets its block from HDFS and then starts partitioning. > > 2012/9/19 Thomas Jungblut <[email protected]> > > > Like Yuesheng Hu already mentioned in the GraphJobRunner > > method loadVertices in the setup stage. > > > > > > 2012/9/19 顾荣 <[email protected]> > > > >> Sorry, I sent last mail by mistake, it's unfinished in the last mail. > >> > >> Hi Thomas, > >> > >> I just read this part of code in the *submitJobInternal*() function of > >> *org.apache.hama.bsp.BSPJobClient. > >> *As you mentioned.raw BSPs have the opportunity to partition before the > >> job, > >> *// Create the splits for the job > >> LOG.debug("Creating splits at " + > >> fs.makeQualified(submitSplitFile)); > >> if (job.getConf().get("bsp.input.partitioner.class") != null > >> && !job.getConf() > >> .getBoolean("hama.graph.runtime.partitioning", false)) { > >> job = partition(job, maxTasks); > >> maxTasks = job.getInt("hama.partition.count", maxTasks); > >> }* > >> > >> By the way, if I do not partition the file on submitting stage. When and > >> where will the vertex in the file will be partitioned and assigned to > each > >> task in Hama Graph? On the master node before running? Or on each groom > >> server at the first superstep? > >> > >> Thanks > >> > >> Walker. > >> > >> > >> > >> 2012/9/19 顾荣 <[email protected]> > >> > >> > Hi Thomas, > >> > > >> > I just read this part of code in the *submitJobInternal*() function of > >> *org.apache.hama.bsp.BSPJobClient. > >> > *As you mentioned.raw BSPs have the opportunity to partition before > the > >> > job, > >> > *// Create the splits for the job > >> > LOG.debug("Creating splits at " + > >> fs.makeQualified(submitSplitFile)); > >> > if (job.getConf().get("bsp.input.partitioner.class") != null > >> > && !job.getConf() > >> > .getBoolean("hama.graph.runtime.partitioning", false)) { > >> > job = partition(job, maxTasks); > >> > maxTasks = job.getInt("hama.partition.count", maxTasks); > >> > }* > >> > > >> > > >> > 2012/9/19 Thomas Jungblut <[email protected]> > >> > > >> >> Hey, > >> >> > >> >> the file is getting split like Hadoop does it, defined by the > >> inputformat. > >> >> It will be partitioned during runtime, raw BSPs have the opportunity > to > >> >> partition before the job, but this is not soo scalable so we have not > >> done > >> >> this in graph algorithms. There is no load balancing besides the > usual > >> >> hash > >> >> partitioning. However you can write your own partitioner to > distribute > >> the > >> >> vertices, we are going to provide work stealing in the future so the > >> load > >> >> balancing gets better. > >> >> > >> >> > >> >> 2012/9/19 Yuesheng Hu <[email protected]> > >> >> > >> >> > org.apache.hama.graph.GraphJobRunner is the most important class in > >> >> should > >> >> > read, also other classes in org.apache.hama.graph > >> >> > > >> >> > > >> >> > 2012/9/19 顾荣 <[email protected]> > >> >> > > >> >> > > Hi All,I have some questions about your design in HamaGraph. Let > me > >> >> take > >> >> > > the PageRank example to illustrate my questions. > >> >> > > > >> >> > > I have 3 Groom Servers each with 3 free BSP task nodes in my Hama > >> >> > > cluster.The input file is as blow. > >> >> > > > >> >> > > "stackoverflow.com yahoo.com > >> >> > > facebook.com twitter.com google.com nasa.gov > >> >> > > yahoo.com nasa.gov stackoverflow.com > >> >> > > twitter.com google.com facebook.com > >> >> > > nasa.gov yahoo.com stackoverflow.com > >> >> > > youtube.com google.com yahoo.com > >> >> > > " > >> >> > > In this case, there are 6 vertexs. How do you assign them among > >> these > >> >> > task > >> >> > > nodes? Can it guarantee load balancing? And, Do you support a > >> >> function to > >> >> > > supply to customize their own vertex assignment policy? I am so > >> >> confused > >> >> > > with the tasks split part of Hama, it seems the same as Hadoop > (by > >> >> input > >> >> > > splits) from its source code, but it works different. And does > the > >> >> task > >> >> > > split part of HamaBSP is the same as HamaGraph? > >> >> > > > >> >> > > Would you please give some info about that? If you are busy to > >> answer > >> >> my > >> >> > > questions, please kindly point it out to me that in which classes > >> or > >> >> > > functions of source code you implemented what I am confused > about, > >> I > >> >> > think > >> >> > > I read it more myself. > >> >> > > > >> >> > > Anyway,Thanks again. > >> >> > > > >> >> > > Walker > >> >> > > > >> >> > > >> >> > >> > > >> > > >> > > > > >
