Thanks, let me have a careful look at it. On a cursory look, I seem to understand the basic idea. Any reasons for deciding to move the PartitioningJob inside BSPJobClient from BSPJob? BTW the current partitioner doesn't work as intended, only the default partitioner HashPartitioner works fine, if I try to put some custom partitioner there are problems.
Let's resolve the partitioning completely before the spilling message queue. -- Regards, Apurv Verma On Tue, Jan 8, 2013 at 8:39 PM, Suraj Menon <[email protected]> wrote: > Hey Apurv, please check HAMA-700.patch_Jan7. Feel free to provide > suggestions or even work on it. > > Thanks, > Suraj > > On Tue, Jan 8, 2013 at 9:21 AM, Apurv Verma <[email protected]> wrote: > > > Hey Edward, > > There was a compile bug which i fixed temporarily. isPartitioned was not > > being initialized. Could you please check the last commit. I have > currently > > initialized it to false but I guess this should be configurable. > > There was some jira where we wanted partitioning to be skipped if user > > thinks his data is already partitioned. > > > > Thanks again. > > > > > > -- > > Regards, > > Apurv Verma > > > > > > > > > > On Tue, Jan 8, 2013 at 3:44 PM, Edward J. Yoon <[email protected] > > >wrote: > > > > > Thanks, then I'll finish tomorrow. Please feel free to comment there. > > > > > > On Tue, Jan 8, 2013 at 7:04 PM, Tommaso Teofili > > > <[email protected]> wrote: > > > > thanks Edward, it looks good. > > > > Tommaso > > > > > > > > > > > > 2013/1/8 Edward J. Yoon <[email protected]> > > > > > > > >> Please review this: > > > >> > > > >> http://wiki.apache.org/hama/Partitioning > > > >> > > > >> On Mon, Jan 7, 2013 at 6:17 AM, Edward J. Yoon < > [email protected] > > > > > > >> wrote: > > > >> > I mean, the pre-partitioning or resizing partitions is really > > > important. > > > >> > > > > >> > On Mon, Jan 7, 2013 at 6:15 AM, Edward J. Yoon < > > [email protected] > > > > > > > >> wrote: > > > >> >> This is another talk ... > > > >> >> > > > >> >> Unlike MapReduce, I think, Hama BSP will handle tasks that input > is > > > >> >> small in size but large in computational complexity, such as > graph, > > > >> >> sparse matrix, machine learning algorithms. > > > >> >> > > > >> >> On Mon, Jan 7, 2013 at 5:57 AM, Edward J. Yoon < > > > [email protected]> > > > >> wrote: > > > >> >>> Even though the numbers of splits and tasks are the same, > > > user-defined > > > >> >>> partitioning job should be run (because it is not only for > > resizing > > > >> >>> partitions. For example, range partitioning of unsorted data set > > or > > > >> >>> hash key partitioning, ..., etc). > > > >> >>> > > > >> >>> On Mon, Jan 7, 2013 at 5:28 AM, Suraj Menon < > > [email protected] > > > > > > > >> wrote: > > > >> >>>>> 1. I am referring to > org.apache.hama.bsp.PartitioningRunner, > > > it's > > > >> named > > > >> >>>>> as so in the HEAD (1429573) of trunk. It isn't removed but > it > > > >> isn't > > > >> >>>>> referred to anywhere else. I can't find any references to > it > > in > > > >> the > > > >> >>>>> workspace. > > > >> >>>>> > > > >> >>>> > > > >> >>>> It is referred in BSPJob#waitForCompletion function as a > separate > > > BSP > > > >> job > > > >> >>>> to create the specified splits. > > > >> >>>> > > > >> >>>> > > > >> >>>>> 2. job.setPartitioner is the same as setting > > > >> >>>>> "bsp.input.partitioner.class" . Anyways , So acc. to me > > > >> partitions are > > > >> >>>>> not > > > >> >>>>> being created because of which the following happens. > > > >> >>>>> If I am running the task on local fs and not hdfs, there's > > just > > > >> one > > > >> >>>>> input split and even if I set a partitioner to create two > > > >> partitions and > > > >> >>>>> set bsp.setNumTasks(2) , this is overriden and only one > task > > is > > > >> >>>>> executed. > > > >> >>>>> See BSPJobClient#submitJobInternal() > > > >> >>>>> where it does the following > > > >> >>>>> job.setNumBspTask(writeSplits(job, submitSplitFile, > > maxTasks)); > > > >> Line > > > >> >>>>> 326. > > > >> >>>>> > > > >> >>>>> This job is set to run if the number of splits != number of > > Tasks > > > or > > > >> if > > > >> >>>> forced by the configuration. I can share my HAMA-700 current > > state > > > of > > > >> patch > > > >> >>>> with you. > > > >> >>>> > > > >> >>>> > > > >> >>>>> 3. So here is what I think is happening, Partitioner is not > > in > > > the > > > >> >>>>> codepath (try putting a breakpoint inside the partitioner > and > > > >> executing > > > >> >>>>> and > > > >> >>>>> non graph bsp task), so partitions are not being created > and > > > >> >>>>> writeSplits() > > > >> >>>>> is returning 1. > > > >> >>>>> [ writeSplits() returns the number of splits in the input. > ] > > > >> >>>>> > > > >> >>>> > > > >> >>>> Probably because it is running as a separate process? > > > >> >>> > > > >> >>> > > > >> >>> > > > >> >>> -- > > > >> >>> Best Regards, Edward J. Yoon > > > >> >>> @eddieyoon > > > >> >> > > > >> >> > > > >> >> > > > >> >> -- > > > >> >> Best Regards, Edward J. Yoon > > > >> >> @eddieyoon > > > >> > > > > >> > > > > >> > > > > >> > -- > > > >> > Best Regards, Edward J. Yoon > > > >> > @eddieyoon > > > >> > > > >> > > > >> > > > >> -- > > > >> Best Regards, Edward J. Yoon > > > >> @eddieyoon > > > >> > > > > > > > > > > > > -- > > > Best Regards, Edward J. Yoon > > > @eddieyoon > > > > > >
