Hey Apurv, please check HAMA-700.patch_Jan7. Feel free to provide suggestions or even work on it.
Thanks, Suraj On Tue, Jan 8, 2013 at 9:21 AM, Apurv Verma <[email protected]> wrote: > Hey Edward, > There was a compile bug which i fixed temporarily. isPartitioned was not > being initialized. Could you please check the last commit. I have currently > initialized it to false but I guess this should be configurable. > There was some jira where we wanted partitioning to be skipped if user > thinks his data is already partitioned. > > Thanks again. > > > -- > Regards, > Apurv Verma > > > > > On Tue, Jan 8, 2013 at 3:44 PM, Edward J. Yoon <[email protected] > >wrote: > > > Thanks, then I'll finish tomorrow. Please feel free to comment there. > > > > On Tue, Jan 8, 2013 at 7:04 PM, Tommaso Teofili > > <[email protected]> wrote: > > > thanks Edward, it looks good. > > > Tommaso > > > > > > > > > 2013/1/8 Edward J. Yoon <[email protected]> > > > > > >> Please review this: > > >> > > >> http://wiki.apache.org/hama/Partitioning > > >> > > >> On Mon, Jan 7, 2013 at 6:17 AM, Edward J. Yoon <[email protected] > > > > >> wrote: > > >> > I mean, the pre-partitioning or resizing partitions is really > > important. > > >> > > > >> > On Mon, Jan 7, 2013 at 6:15 AM, Edward J. Yoon < > [email protected] > > > > > >> wrote: > > >> >> This is another talk ... > > >> >> > > >> >> Unlike MapReduce, I think, Hama BSP will handle tasks that input is > > >> >> small in size but large in computational complexity, such as graph, > > >> >> sparse matrix, machine learning algorithms. > > >> >> > > >> >> On Mon, Jan 7, 2013 at 5:57 AM, Edward J. Yoon < > > [email protected]> > > >> wrote: > > >> >>> Even though the numbers of splits and tasks are the same, > > user-defined > > >> >>> partitioning job should be run (because it is not only for > resizing > > >> >>> partitions. For example, range partitioning of unsorted data set > or > > >> >>> hash key partitioning, ..., etc). > > >> >>> > > >> >>> On Mon, Jan 7, 2013 at 5:28 AM, Suraj Menon < > [email protected] > > > > > >> wrote: > > >> >>>>> 1. I am referring to org.apache.hama.bsp.PartitioningRunner, > > it's > > >> named > > >> >>>>> as so in the HEAD (1429573) of trunk. It isn't removed but it > > >> isn't > > >> >>>>> referred to anywhere else. I can't find any references to it > in > > >> the > > >> >>>>> workspace. > > >> >>>>> > > >> >>>> > > >> >>>> It is referred in BSPJob#waitForCompletion function as a separate > > BSP > > >> job > > >> >>>> to create the specified splits. > > >> >>>> > > >> >>>> > > >> >>>>> 2. job.setPartitioner is the same as setting > > >> >>>>> "bsp.input.partitioner.class" . Anyways , So acc. to me > > >> partitions are > > >> >>>>> not > > >> >>>>> being created because of which the following happens. > > >> >>>>> If I am running the task on local fs and not hdfs, there's > just > > >> one > > >> >>>>> input split and even if I set a partitioner to create two > > >> partitions and > > >> >>>>> set bsp.setNumTasks(2) , this is overriden and only one task > is > > >> >>>>> executed. > > >> >>>>> See BSPJobClient#submitJobInternal() > > >> >>>>> where it does the following > > >> >>>>> job.setNumBspTask(writeSplits(job, submitSplitFile, > maxTasks)); > > >> Line > > >> >>>>> 326. > > >> >>>>> > > >> >>>>> This job is set to run if the number of splits != number of > Tasks > > or > > >> if > > >> >>>> forced by the configuration. I can share my HAMA-700 current > state > > of > > >> patch > > >> >>>> with you. > > >> >>>> > > >> >>>> > > >> >>>>> 3. So here is what I think is happening, Partitioner is not > in > > the > > >> >>>>> codepath (try putting a breakpoint inside the partitioner and > > >> executing > > >> >>>>> and > > >> >>>>> non graph bsp task), so partitions are not being created and > > >> >>>>> writeSplits() > > >> >>>>> is returning 1. > > >> >>>>> [ writeSplits() returns the number of splits in the input. ] > > >> >>>>> > > >> >>>> > > >> >>>> Probably because it is running as a separate process? > > >> >>> > > >> >>> > > >> >>> > > >> >>> -- > > >> >>> Best Regards, Edward J. Yoon > > >> >>> @eddieyoon > > >> >> > > >> >> > > >> >> > > >> >> -- > > >> >> Best Regards, Edward J. Yoon > > >> >> @eddieyoon > > >> > > > >> > > > >> > > > >> > -- > > >> > Best Regards, Edward J. Yoon > > >> > @eddieyoon > > >> > > >> > > >> > > >> -- > > >> Best Regards, Edward J. Yoon > > >> @eddieyoon > > >> > > > > > > > > -- > > Best Regards, Edward J. Yoon > > @eddieyoon > > >
