Sorry, I sent last mail by mistake, it's unfinished in the last mail.
Hi Thomas,
I just read this part of code in the *submitJobInternal*() function of
*org.apache.hama.bsp.BSPJobClient.
*As you mentioned.raw BSPs have the opportunity to partition before the job,
*// Create the splits for the job
LOG.debug("Creating splits at " + fs.makeQualified(submitSplitFile));
if (job.getConf().get("bsp.input.partitioner.class") != null
&& !job.getConf()
.getBoolean("hama.graph.runtime.partitioning", false)) {
job = partition(job, maxTasks);
maxTasks = job.getInt("hama.partition.count", maxTasks);
}*
By the way, if I do not partition the file on submitting stage. When and
where will the vertex in the file will be partitioned and assigned to each
task in Hama Graph? On the master node before running? Or on each groom
server at the first superstep?
Thanks
Walker.
2012/9/19 顾荣 <[email protected]>
> Hi Thomas,
>
> I just read this part of code in the *submitJobInternal*() function of
> *org.apache.hama.bsp.BSPJobClient.
> *As you mentioned.raw BSPs have the opportunity to partition before the
> job,
> *// Create the splits for the job
> LOG.debug("Creating splits at " + fs.makeQualified(submitSplitFile));
> if (job.getConf().get("bsp.input.partitioner.class") != null
> && !job.getConf()
> .getBoolean("hama.graph.runtime.partitioning", false)) {
> job = partition(job, maxTasks);
> maxTasks = job.getInt("hama.partition.count", maxTasks);
> }*
>
>
> 2012/9/19 Thomas Jungblut <[email protected]>
>
>> Hey,
>>
>> the file is getting split like Hadoop does it, defined by the inputformat.
>> It will be partitioned during runtime, raw BSPs have the opportunity to
>> partition before the job, but this is not soo scalable so we have not done
>> this in graph algorithms. There is no load balancing besides the usual
>> hash
>> partitioning. However you can write your own partitioner to distribute the
>> vertices, we are going to provide work stealing in the future so the load
>> balancing gets better.
>>
>>
>> 2012/9/19 Yuesheng Hu <[email protected]>
>>
>> > org.apache.hama.graph.GraphJobRunner is the most important class in
>> should
>> > read, also other classes in org.apache.hama.graph
>> >
>> >
>> > 2012/9/19 顾荣 <[email protected]>
>> >
>> > > Hi All,I have some questions about your design in HamaGraph. Let me
>> take
>> > > the PageRank example to illustrate my questions.
>> > >
>> > > I have 3 Groom Servers each with 3 free BSP task nodes in my Hama
>> > > cluster.The input file is as blow.
>> > >
>> > > "stackoverflow.com yahoo.com
>> > > facebook.com twitter.com google.com nasa.gov
>> > > yahoo.com nasa.gov stackoverflow.com
>> > > twitter.com google.com facebook.com
>> > > nasa.gov yahoo.com stackoverflow.com
>> > > youtube.com google.com yahoo.com
>> > > "
>> > > In this case, there are 6 vertexs. How do you assign them among these
>> > task
>> > > nodes? Can it guarantee load balancing? And, Do you support a
>> function to
>> > > supply to customize their own vertex assignment policy? I am so
>> confused
>> > > with the tasks split part of Hama, it seems the same as Hadoop (by
>> input
>> > > splits) from its source code, but it works different. And does the
>> task
>> > > split part of HamaBSP is the same as HamaGraph?
>> > >
>> > > Would you please give some info about that? If you are busy to answer
>> my
>> > > questions, please kindly point it out to me that in which classes or
>> > > functions of source code you implemented what I am confused about, I
>> > think
>> > > I read it more myself.
>> > >
>> > > Anyway,Thanks again.
>> > >
>> > > Walker
>> > >
>> >
>>
>
>