Like Yuesheng Hu already mentioned in the GraphJobRunner method loadVertices in the setup stage.
2012/9/19 顾荣 <[email protected]> > Sorry, I sent last mail by mistake, it's unfinished in the last mail. > > Hi Thomas, > > I just read this part of code in the *submitJobInternal*() function of > *org.apache.hama.bsp.BSPJobClient. > *As you mentioned.raw BSPs have the opportunity to partition before the > job, > *// Create the splits for the job > LOG.debug("Creating splits at " + fs.makeQualified(submitSplitFile)); > if (job.getConf().get("bsp.input.partitioner.class") != null > && !job.getConf() > .getBoolean("hama.graph.runtime.partitioning", false)) { > job = partition(job, maxTasks); > maxTasks = job.getInt("hama.partition.count", maxTasks); > }* > > By the way, if I do not partition the file on submitting stage. When and > where will the vertex in the file will be partitioned and assigned to each > task in Hama Graph? On the master node before running? Or on each groom > server at the first superstep? > > Thanks > > Walker. > > > > 2012/9/19 顾荣 <[email protected]> > > > Hi Thomas, > > > > I just read this part of code in the *submitJobInternal*() function of > *org.apache.hama.bsp.BSPJobClient. > > *As you mentioned.raw BSPs have the opportunity to partition before the > > job, > > *// Create the splits for the job > > LOG.debug("Creating splits at " + > fs.makeQualified(submitSplitFile)); > > if (job.getConf().get("bsp.input.partitioner.class") != null > > && !job.getConf() > > .getBoolean("hama.graph.runtime.partitioning", false)) { > > job = partition(job, maxTasks); > > maxTasks = job.getInt("hama.partition.count", maxTasks); > > }* > > > > > > 2012/9/19 Thomas Jungblut <[email protected]> > > > >> Hey, > >> > >> the file is getting split like Hadoop does it, defined by the > inputformat. > >> It will be partitioned during runtime, raw BSPs have the opportunity to > >> partition before the job, but this is not soo scalable so we have not > done > >> this in graph algorithms. There is no load balancing besides the usual > >> hash > >> partitioning. However you can write your own partitioner to distribute > the > >> vertices, we are going to provide work stealing in the future so the > load > >> balancing gets better. > >> > >> > >> 2012/9/19 Yuesheng Hu <[email protected]> > >> > >> > org.apache.hama.graph.GraphJobRunner is the most important class in > >> should > >> > read, also other classes in org.apache.hama.graph > >> > > >> > > >> > 2012/9/19 顾荣 <[email protected]> > >> > > >> > > Hi All,I have some questions about your design in HamaGraph. Let me > >> take > >> > > the PageRank example to illustrate my questions. > >> > > > >> > > I have 3 Groom Servers each with 3 free BSP task nodes in my Hama > >> > > cluster.The input file is as blow. > >> > > > >> > > "stackoverflow.com yahoo.com > >> > > facebook.com twitter.com google.com nasa.gov > >> > > yahoo.com nasa.gov stackoverflow.com > >> > > twitter.com google.com facebook.com > >> > > nasa.gov yahoo.com stackoverflow.com > >> > > youtube.com google.com yahoo.com > >> > > " > >> > > In this case, there are 6 vertexs. How do you assign them among > these > >> > task > >> > > nodes? Can it guarantee load balancing? And, Do you support a > >> function to > >> > > supply to customize their own vertex assignment policy? I am so > >> confused > >> > > with the tasks split part of Hama, it seems the same as Hadoop (by > >> input > >> > > splits) from its source code, but it works different. And does the > >> task > >> > > split part of HamaBSP is the same as HamaGraph? > >> > > > >> > > Would you please give some info about that? If you are busy to > answer > >> my > >> > > questions, please kindly point it out to me that in which classes or > >> > > functions of source code you implemented what I am confused about, I > >> > think > >> > > I read it more myself. > >> > > > >> > > Anyway,Thanks again. > >> > > > >> > > Walker > >> > > > >> > > >> > > > > >
