Hey Edward, There was a compile bug which i fixed temporarily. isPartitioned was not being initialized. Could you please check the last commit. I have currently initialized it to false but I guess this should be configurable. There was some jira where we wanted partitioning to be skipped if user thinks his data is already partitioned.
Thanks again. -- Regards, Apurv Verma On Tue, Jan 8, 2013 at 3:44 PM, Edward J. Yoon <[email protected]>wrote: > Thanks, then I'll finish tomorrow. Please feel free to comment there. > > On Tue, Jan 8, 2013 at 7:04 PM, Tommaso Teofili > <[email protected]> wrote: > > thanks Edward, it looks good. > > Tommaso > > > > > > 2013/1/8 Edward J. Yoon <[email protected]> > > > >> Please review this: > >> > >> http://wiki.apache.org/hama/Partitioning > >> > >> On Mon, Jan 7, 2013 at 6:17 AM, Edward J. Yoon <[email protected]> > >> wrote: > >> > I mean, the pre-partitioning or resizing partitions is really > important. > >> > > >> > On Mon, Jan 7, 2013 at 6:15 AM, Edward J. Yoon <[email protected] > > > >> wrote: > >> >> This is another talk ... > >> >> > >> >> Unlike MapReduce, I think, Hama BSP will handle tasks that input is > >> >> small in size but large in computational complexity, such as graph, > >> >> sparse matrix, machine learning algorithms. > >> >> > >> >> On Mon, Jan 7, 2013 at 5:57 AM, Edward J. Yoon < > [email protected]> > >> wrote: > >> >>> Even though the numbers of splits and tasks are the same, > user-defined > >> >>> partitioning job should be run (because it is not only for resizing > >> >>> partitions. For example, range partitioning of unsorted data set or > >> >>> hash key partitioning, ..., etc). > >> >>> > >> >>> On Mon, Jan 7, 2013 at 5:28 AM, Suraj Menon <[email protected] > > > >> wrote: > >> >>>>> 1. I am referring to org.apache.hama.bsp.PartitioningRunner, > it's > >> named > >> >>>>> as so in the HEAD (1429573) of trunk. It isn't removed but it > >> isn't > >> >>>>> referred to anywhere else. I can't find any references to it in > >> the > >> >>>>> workspace. > >> >>>>> > >> >>>> > >> >>>> It is referred in BSPJob#waitForCompletion function as a separate > BSP > >> job > >> >>>> to create the specified splits. > >> >>>> > >> >>>> > >> >>>>> 2. job.setPartitioner is the same as setting > >> >>>>> "bsp.input.partitioner.class" . Anyways , So acc. to me > >> partitions are > >> >>>>> not > >> >>>>> being created because of which the following happens. > >> >>>>> If I am running the task on local fs and not hdfs, there's just > >> one > >> >>>>> input split and even if I set a partitioner to create two > >> partitions and > >> >>>>> set bsp.setNumTasks(2) , this is overriden and only one task is > >> >>>>> executed. > >> >>>>> See BSPJobClient#submitJobInternal() > >> >>>>> where it does the following > >> >>>>> job.setNumBspTask(writeSplits(job, submitSplitFile, maxTasks)); > >> Line > >> >>>>> 326. > >> >>>>> > >> >>>>> This job is set to run if the number of splits != number of Tasks > or > >> if > >> >>>> forced by the configuration. I can share my HAMA-700 current state > of > >> patch > >> >>>> with you. > >> >>>> > >> >>>> > >> >>>>> 3. So here is what I think is happening, Partitioner is not in > the > >> >>>>> codepath (try putting a breakpoint inside the partitioner and > >> executing > >> >>>>> and > >> >>>>> non graph bsp task), so partitions are not being created and > >> >>>>> writeSplits() > >> >>>>> is returning 1. > >> >>>>> [ writeSplits() returns the number of splits in the input. ] > >> >>>>> > >> >>>> > >> >>>> Probably because it is running as a separate process? > >> >>> > >> >>> > >> >>> > >> >>> -- > >> >>> Best Regards, Edward J. Yoon > >> >>> @eddieyoon > >> >> > >> >> > >> >> > >> >> -- > >> >> Best Regards, Edward J. Yoon > >> >> @eddieyoon > >> > > >> > > >> > > >> > -- > >> > Best Regards, Edward J. Yoon > >> > @eddieyoon > >> > >> > >> > >> -- > >> Best Regards, Edward J. Yoon > >> @eddieyoon > >> > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon >
