Please review this: http://wiki.apache.org/hama/Partitioning
On Mon, Jan 7, 2013 at 6:17 AM, Edward J. Yoon <[email protected]> wrote: > I mean, the pre-partitioning or resizing partitions is really important. > > On Mon, Jan 7, 2013 at 6:15 AM, Edward J. Yoon <[email protected]> wrote: >> This is another talk ... >> >> Unlike MapReduce, I think, Hama BSP will handle tasks that input is >> small in size but large in computational complexity, such as graph, >> sparse matrix, machine learning algorithms. >> >> On Mon, Jan 7, 2013 at 5:57 AM, Edward J. Yoon <[email protected]> wrote: >>> Even though the numbers of splits and tasks are the same, user-defined >>> partitioning job should be run (because it is not only for resizing >>> partitions. For example, range partitioning of unsorted data set or >>> hash key partitioning, ..., etc). >>> >>> On Mon, Jan 7, 2013 at 5:28 AM, Suraj Menon <[email protected]> wrote: >>>>> 1. I am referring to org.apache.hama.bsp.PartitioningRunner, it's named >>>>> as so in the HEAD (1429573) of trunk. It isn't removed but it isn't >>>>> referred to anywhere else. I can't find any references to it in the >>>>> workspace. >>>>> >>>> >>>> It is referred in BSPJob#waitForCompletion function as a separate BSP job >>>> to create the specified splits. >>>> >>>> >>>>> 2. job.setPartitioner is the same as setting >>>>> "bsp.input.partitioner.class" . Anyways , So acc. to me partitions are >>>>> not >>>>> being created because of which the following happens. >>>>> If I am running the task on local fs and not hdfs, there's just one >>>>> input split and even if I set a partitioner to create two partitions >>>>> and >>>>> set bsp.setNumTasks(2) , this is overriden and only one task is >>>>> executed. >>>>> See BSPJobClient#submitJobInternal() >>>>> where it does the following >>>>> job.setNumBspTask(writeSplits(job, submitSplitFile, maxTasks)); Line >>>>> 326. >>>>> >>>>> This job is set to run if the number of splits != number of Tasks or if >>>> forced by the configuration. I can share my HAMA-700 current state of patch >>>> with you. >>>> >>>> >>>>> 3. So here is what I think is happening, Partitioner is not in the >>>>> codepath (try putting a breakpoint inside the partitioner and executing >>>>> and >>>>> non graph bsp task), so partitions are not being created and >>>>> writeSplits() >>>>> is returning 1. >>>>> [ writeSplits() returns the number of splits in the input. ] >>>>> >>>> >>>> Probably because it is running as a separate process? >>> >>> >>> >>> -- >>> Best Regards, Edward J. Yoon >>> @eddieyoon >> >> >> >> -- >> Best Regards, Edward J. Yoon >> @eddieyoon > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon -- Best Regards, Edward J. Yoon @eddieyoon
