Thanks, then I'll finish tomorrow. Please feel free to comment there. On Tue, Jan 8, 2013 at 7:04 PM, Tommaso Teofili <[email protected]> wrote: > thanks Edward, it looks good. > Tommaso > > > 2013/1/8 Edward J. Yoon <[email protected]> > >> Please review this: >> >> http://wiki.apache.org/hama/Partitioning >> >> On Mon, Jan 7, 2013 at 6:17 AM, Edward J. Yoon <[email protected]> >> wrote: >> > I mean, the pre-partitioning or resizing partitions is really important. >> > >> > On Mon, Jan 7, 2013 at 6:15 AM, Edward J. Yoon <[email protected]> >> wrote: >> >> This is another talk ... >> >> >> >> Unlike MapReduce, I think, Hama BSP will handle tasks that input is >> >> small in size but large in computational complexity, such as graph, >> >> sparse matrix, machine learning algorithms. >> >> >> >> On Mon, Jan 7, 2013 at 5:57 AM, Edward J. Yoon <[email protected]> >> wrote: >> >>> Even though the numbers of splits and tasks are the same, user-defined >> >>> partitioning job should be run (because it is not only for resizing >> >>> partitions. For example, range partitioning of unsorted data set or >> >>> hash key partitioning, ..., etc). >> >>> >> >>> On Mon, Jan 7, 2013 at 5:28 AM, Suraj Menon <[email protected]> >> wrote: >> >>>>> 1. I am referring to org.apache.hama.bsp.PartitioningRunner, it's >> named >> >>>>> as so in the HEAD (1429573) of trunk. It isn't removed but it >> isn't >> >>>>> referred to anywhere else. I can't find any references to it in >> the >> >>>>> workspace. >> >>>>> >> >>>> >> >>>> It is referred in BSPJob#waitForCompletion function as a separate BSP >> job >> >>>> to create the specified splits. >> >>>> >> >>>> >> >>>>> 2. job.setPartitioner is the same as setting >> >>>>> "bsp.input.partitioner.class" . Anyways , So acc. to me >> partitions are >> >>>>> not >> >>>>> being created because of which the following happens. >> >>>>> If I am running the task on local fs and not hdfs, there's just >> one >> >>>>> input split and even if I set a partitioner to create two >> partitions and >> >>>>> set bsp.setNumTasks(2) , this is overriden and only one task is >> >>>>> executed. >> >>>>> See BSPJobClient#submitJobInternal() >> >>>>> where it does the following >> >>>>> job.setNumBspTask(writeSplits(job, submitSplitFile, maxTasks)); >> Line >> >>>>> 326. >> >>>>> >> >>>>> This job is set to run if the number of splits != number of Tasks or >> if >> >>>> forced by the configuration. I can share my HAMA-700 current state of >> patch >> >>>> with you. >> >>>> >> >>>> >> >>>>> 3. So here is what I think is happening, Partitioner is not in the >> >>>>> codepath (try putting a breakpoint inside the partitioner and >> executing >> >>>>> and >> >>>>> non graph bsp task), so partitions are not being created and >> >>>>> writeSplits() >> >>>>> is returning 1. >> >>>>> [ writeSplits() returns the number of splits in the input. ] >> >>>>> >> >>>> >> >>>> Probably because it is running as a separate process? >> >>> >> >>> >> >>> >> >>> -- >> >>> Best Regards, Edward J. Yoon >> >>> @eddieyoon >> >> >> >> >> >> >> >> -- >> >> Best Regards, Edward J. Yoon >> >> @eddieyoon >> > >> > >> > >> > -- >> > Best Regards, Edward J. Yoon >> > @eddieyoon >> >> >> >> -- >> Best Regards, Edward J. Yoon >> @eddieyoon >>
-- Best Regards, Edward J. Yoon @eddieyoon
