thanks Edward, it looks good. Tommaso
2013/1/8 Edward J. Yoon <[email protected]> > Please review this: > > http://wiki.apache.org/hama/Partitioning > > On Mon, Jan 7, 2013 at 6:17 AM, Edward J. Yoon <[email protected]> > wrote: > > I mean, the pre-partitioning or resizing partitions is really important. > > > > On Mon, Jan 7, 2013 at 6:15 AM, Edward J. Yoon <[email protected]> > wrote: > >> This is another talk ... > >> > >> Unlike MapReduce, I think, Hama BSP will handle tasks that input is > >> small in size but large in computational complexity, such as graph, > >> sparse matrix, machine learning algorithms. > >> > >> On Mon, Jan 7, 2013 at 5:57 AM, Edward J. Yoon <[email protected]> > wrote: > >>> Even though the numbers of splits and tasks are the same, user-defined > >>> partitioning job should be run (because it is not only for resizing > >>> partitions. For example, range partitioning of unsorted data set or > >>> hash key partitioning, ..., etc). > >>> > >>> On Mon, Jan 7, 2013 at 5:28 AM, Suraj Menon <[email protected]> > wrote: > >>>>> 1. I am referring to org.apache.hama.bsp.PartitioningRunner, it's > named > >>>>> as so in the HEAD (1429573) of trunk. It isn't removed but it > isn't > >>>>> referred to anywhere else. I can't find any references to it in > the > >>>>> workspace. > >>>>> > >>>> > >>>> It is referred in BSPJob#waitForCompletion function as a separate BSP > job > >>>> to create the specified splits. > >>>> > >>>> > >>>>> 2. job.setPartitioner is the same as setting > >>>>> "bsp.input.partitioner.class" . Anyways , So acc. to me > partitions are > >>>>> not > >>>>> being created because of which the following happens. > >>>>> If I am running the task on local fs and not hdfs, there's just > one > >>>>> input split and even if I set a partitioner to create two > partitions and > >>>>> set bsp.setNumTasks(2) , this is overriden and only one task is > >>>>> executed. > >>>>> See BSPJobClient#submitJobInternal() > >>>>> where it does the following > >>>>> job.setNumBspTask(writeSplits(job, submitSplitFile, maxTasks)); > Line > >>>>> 326. > >>>>> > >>>>> This job is set to run if the number of splits != number of Tasks or > if > >>>> forced by the configuration. I can share my HAMA-700 current state of > patch > >>>> with you. > >>>> > >>>> > >>>>> 3. So here is what I think is happening, Partitioner is not in the > >>>>> codepath (try putting a breakpoint inside the partitioner and > executing > >>>>> and > >>>>> non graph bsp task), so partitions are not being created and > >>>>> writeSplits() > >>>>> is returning 1. > >>>>> [ writeSplits() returns the number of splits in the input. ] > >>>>> > >>>> > >>>> Probably because it is running as a separate process? > >>> > >>> > >>> > >>> -- > >>> Best Regards, Edward J. Yoon > >>> @eddieyoon > >> > >> > >> > >> -- > >> Best Regards, Edward J. Yoon > >> @eddieyoon > > > > > > > > -- > > Best Regards, Edward J. Yoon > > @eddieyoon > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon >
