This is another talk ... Unlike MapReduce, I think, Hama BSP will handle tasks that input is small in size but large in computational complexity, such as graph, sparse matrix, machine learning algorithms.
On Mon, Jan 7, 2013 at 5:57 AM, Edward J. Yoon <[email protected]> wrote: > Even though the numbers of splits and tasks are the same, user-defined > partitioning job should be run (because it is not only for resizing > partitions. For example, range partitioning of unsorted data set or > hash key partitioning, ..., etc). > > On Mon, Jan 7, 2013 at 5:28 AM, Suraj Menon <[email protected]> wrote: >>> 1. I am referring to org.apache.hama.bsp.PartitioningRunner, it's named >>> as so in the HEAD (1429573) of trunk. It isn't removed but it isn't >>> referred to anywhere else. I can't find any references to it in the >>> workspace. >>> >> >> It is referred in BSPJob#waitForCompletion function as a separate BSP job >> to create the specified splits. >> >> >>> 2. job.setPartitioner is the same as setting >>> "bsp.input.partitioner.class" . Anyways , So acc. to me partitions are >>> not >>> being created because of which the following happens. >>> If I am running the task on local fs and not hdfs, there's just one >>> input split and even if I set a partitioner to create two partitions and >>> set bsp.setNumTasks(2) , this is overriden and only one task is >>> executed. >>> See BSPJobClient#submitJobInternal() >>> where it does the following >>> job.setNumBspTask(writeSplits(job, submitSplitFile, maxTasks)); Line >>> 326. >>> >>> This job is set to run if the number of splits != number of Tasks or if >> forced by the configuration. I can share my HAMA-700 current state of patch >> with you. >> >> >>> 3. So here is what I think is happening, Partitioner is not in the >>> codepath (try putting a breakpoint inside the partitioner and executing >>> and >>> non graph bsp task), so partitions are not being created and >>> writeSplits() >>> is returning 1. >>> [ writeSplits() returns the number of splits in the input. ] >>> >> >> Probably because it is running as a separate process? > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon -- Best Regards, Edward J. Yoon @eddieyoon
