Let's don't use the term "runtime partitioning" at this time. Originally,
* Partitioning was handled by single client-side 'BSPJobClient'. * And, there were separate partition processing logic in GraphJobRunner, called run-time partitioning. And now, by using BSP job for partitioning input-data, we can process read and write operations in parallel. Also, data locality is preserved at least for read operations. Above all things, we can specify the number of BSP tasks now. If we want to implement network-based run-time partitioning, it should be processed before BSP's setup() method internally. I think we can hold the run-time partitioning for later on. On Wed, Jan 9, 2013 at 8:56 AM, Suraj Menon <[email protected]> wrote: >> Keeping run-time (network-based) partitioning within GraphJobRunner is >> not good idea. > > > It is not. I think I got testSubmitGraph to runtime partition (in > preprocessing step) the single file into 2 files in the unit tests in my > current state of patch.. > > >> >> - the number of splits found are not equal to the number of BSP tasks >> >> configured for the job. OR >> >> I have a question. If the input is unsorted map and I want to >> re-partition by hashing but the numbers of blocks and desired tasks >> are same, then what happens? Do you mean run-time partitioning? > > You will have runtime partitioner class defined and partitioning flag on by > default. For case of HAMA-561 a user can switch off partitioning using the > same flag. > > > >> On Wed, Jan 9, 2013 at 7:07 AM, Suraj Menon <[email protected]> >> wrote: >> > Hi Apurv, yes, those are pending test cases to be fixed. GraphJobRunner >> is >> > expecting the input in the format of Vertex, but we have input files as >> > well as record key, values defined as Text. I have fixed only one unit >> test >> > case yet. >> > >> > On Tue, Jan 8, 2013 at 4:45 PM, Apurv Verma <[email protected]> wrote: >> > >> >> Hey all, >> >> I got the problem, the partitioner was not being set for the >> >> PartitionerRunner bsp task. :P I have fixed the partitioner with >> portions >> >> from your patch Suraj. Now after this commit partitioner will obey what >> you >> >> specified earlier, just to recapitulate. >> >> >> >> Repartitioning is done if : >> >> - the number of splits found are not equal to the number of BSP tasks >> >> configured for the job. OR >> >> - the flag is set to true by the user >> ("bsp.input.runtime.partitioning") OR >> >> - user has specified a Runtime Partitioner class and enabled runtime >> >> partitioning >> >> >> >> There was one special thing that I discovered about partitioner , just >> >> sharing with you guys. Suppose I implement a partitioner which returns 0 >> >> for a record, then it isn't necessary that this record will go to peer >> with >> >> index 0. It might go to peer 1. The only certitude which partitioner's >> >> provide is that all records returning 0 will go to the same peer. I >> needed >> >> partitioner to work for PrefixSum I was implementing. >> >> >> >> Things to do next. >> >> 1) RecordConverter , which Suraj is implementing in HAMA-700. (Please >> >> update Suraj) >> >> >> >> B.T.W there are problems in mvn test. >> >> *java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast >> to >> >> org.apache.hadoop.io.ArrayWritable* >> >> * at >> >> >> org.apache.hama.graph.GraphJobRunner.loadVertices(GraphJobRunner.java:287)* >> >> * >> >> * >> >> I don't think my commit is breaking this. >> >> >> >> Thanks >> >> >> >> >> >> -- >> >> Regards, >> >> Apurv Verma >> >> >> >> >> >> >> >> >> >> On Tue, Jan 8, 2013 at 11:07 PM, Suraj Menon <[email protected]> >> >> wrote: >> >> >> >> > Please explain the nature of problems you are facing with Partitioner? >> >> > >> >> > >Any reasons for deciding to move the >> >> > > PartitioningJob inside BSPJobClient from BSPJob? >> >> > >> >> > Twofold, BSPJob was just a configuration holder object, didn't want to >> >> add >> >> > the partitioning responsibility to the class. >> >> > And also I wanted to know the number of splits, before taking the >> >> decision >> >> > whether to repartition or not. >> >> > Repartitioning is done if : >> >> > - the number of splits found are not equal to the number of BSP tasks >> >> > configured for the job. OR >> >> > - the flag is set to true by the user >> ("bsp.input.runtime.partitioning") >> >> OR >> >> > - user has specified a Runtime Partitioner class and enabled runtime >> >> > partitioning >> >> > >> >> > Thanks, >> >> > Suraj >> >> > >> >> > On Tue, Jan 8, 2013 at 11:31 AM, Apurv Verma <[email protected]> >> wrote: >> >> > >> >> > > Thanks, let me have a careful look at it. On a cursory look, I seem >> to >> >> > > understand the basic idea. Any reasons for deciding to move the >> >> > > PartitioningJob inside BSPJobClient from BSPJob? >> >> > > BTW the current partitioner doesn't work as intended, only the >> default >> >> > > partitioner HashPartitioner works fine, if I try to put some custom >> >> > > partitioner there are problems. >> >> > > >> >> > > Let's resolve the partitioning completely before the spilling >> message >> >> > > queue. >> >> > > >> >> > > >> >> > > -- >> >> > > Regards, >> >> > > Apurv Verma >> >> > > >> >> > > >> >> > > >> -- Best Regards, Edward J. Yoon @eddieyoon
