I mean, the pre-partitioning or resizing partitions is really important. On Mon, Jan 7, 2013 at 6:15 AM, Edward J. Yoon <[email protected]> wrote: > This is another talk ... > > Unlike MapReduce, I think, Hama BSP will handle tasks that input is > small in size but large in computational complexity, such as graph, > sparse matrix, machine learning algorithms. > > On Mon, Jan 7, 2013 at 5:57 AM, Edward J. Yoon <[email protected]> wrote: >> Even though the numbers of splits and tasks are the same, user-defined >> partitioning job should be run (because it is not only for resizing >> partitions. For example, range partitioning of unsorted data set or >> hash key partitioning, ..., etc). >> >> On Mon, Jan 7, 2013 at 5:28 AM, Suraj Menon <[email protected]> wrote: >>>> 1. I am referring to org.apache.hama.bsp.PartitioningRunner, it's named >>>> as so in the HEAD (1429573) of trunk. It isn't removed but it isn't >>>> referred to anywhere else. I can't find any references to it in the >>>> workspace. >>>> >>> >>> It is referred in BSPJob#waitForCompletion function as a separate BSP job >>> to create the specified splits. >>> >>> >>>> 2. job.setPartitioner is the same as setting >>>> "bsp.input.partitioner.class" . Anyways , So acc. to me partitions are >>>> not >>>> being created because of which the following happens. >>>> If I am running the task on local fs and not hdfs, there's just one >>>> input split and even if I set a partitioner to create two partitions and >>>> set bsp.setNumTasks(2) , this is overriden and only one task is >>>> executed. >>>> See BSPJobClient#submitJobInternal() >>>> where it does the following >>>> job.setNumBspTask(writeSplits(job, submitSplitFile, maxTasks)); Line >>>> 326. >>>> >>>> This job is set to run if the number of splits != number of Tasks or if >>> forced by the configuration. I can share my HAMA-700 current state of patch >>> with you. >>> >>> >>>> 3. So here is what I think is happening, Partitioner is not in the >>>> codepath (try putting a breakpoint inside the partitioner and executing >>>> and >>>> non graph bsp task), so partitions are not being created and >>>> writeSplits() >>>> is returning 1. >>>> [ writeSplits() returns the number of splits in the input. ] >>>> >>> >>> Probably because it is running as a separate process? >> >> >> >> -- >> Best Regards, Edward J. Yoon >> @eddieyoon > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon
-- Best Regards, Edward J. Yoon @eddieyoon
