Even though the numbers of splits and tasks are the same, user-defined partitioning job should be run (because it is not only for resizing partitions. For example, range partitioning of unsorted data set or hash key partitioning, ..., etc).
On Mon, Jan 7, 2013 at 5:28 AM, Suraj Menon <[email protected]> wrote: >> 1. I am referring to org.apache.hama.bsp.PartitioningRunner, it's named >> as so in the HEAD (1429573) of trunk. It isn't removed but it isn't >> referred to anywhere else. I can't find any references to it in the >> workspace. >> > > It is referred in BSPJob#waitForCompletion function as a separate BSP job > to create the specified splits. > > >> 2. job.setPartitioner is the same as setting >> "bsp.input.partitioner.class" . Anyways , So acc. to me partitions are >> not >> being created because of which the following happens. >> If I am running the task on local fs and not hdfs, there's just one >> input split and even if I set a partitioner to create two partitions and >> set bsp.setNumTasks(2) , this is overriden and only one task is >> executed. >> See BSPJobClient#submitJobInternal() >> where it does the following >> job.setNumBspTask(writeSplits(job, submitSplitFile, maxTasks)); Line >> 326. >> >> This job is set to run if the number of splits != number of Tasks or if > forced by the configuration. I can share my HAMA-700 current state of patch > with you. > > >> 3. So here is what I think is happening, Partitioner is not in the >> codepath (try putting a breakpoint inside the partitioner and executing >> and >> non graph bsp task), so partitions are not being created and >> writeSplits() >> is returning 1. >> [ writeSplits() returns the number of splits in the input. ] >> > > Probably because it is running as a separate process? -- Best Regards, Edward J. Yoon @eddieyoon
