Sorry, I was confused in term. ;) On Wed, Jan 9, 2013 at 3:00 PM, Edward J. Yoon <[email protected]> wrote: > Let's don't use the term "runtime partitioning" at this time. > > Originally, > > * Partitioning was handled by single client-side 'BSPJobClient'. > * And, there were separate partition processing logic in > GraphJobRunner, called run-time partitioning. > > And now, by using BSP job for partitioning input-data, we can process > read and write operations in parallel. Also, data locality is > preserved at least for read operations. Above all things, we can > specify the number of BSP tasks now. > > If we want to implement network-based run-time partitioning, it should > be processed before BSP's setup() method internally. I think we can > hold the run-time partitioning for later on. > > On Wed, Jan 9, 2013 at 8:56 AM, Suraj Menon <[email protected]> wrote: >>> Keeping run-time (network-based) partitioning within GraphJobRunner is >>> not good idea. >> >> >> It is not. I think I got testSubmitGraph to runtime partition (in >> preprocessing step) the single file into 2 files in the unit tests in my >> current state of patch.. >> >> >>> >> - the number of splits found are not equal to the number of BSP tasks >>> >> configured for the job. OR >>> >>> I have a question. If the input is unsorted map and I want to >>> re-partition by hashing but the numbers of blocks and desired tasks >>> are same, then what happens? Do you mean run-time partitioning? >> >> You will have runtime partitioner class defined and partitioning flag on by >> default. For case of HAMA-561 a user can switch off partitioning using the >> same flag. >> >> >> >>> On Wed, Jan 9, 2013 at 7:07 AM, Suraj Menon <[email protected]> >>> wrote: >>> > Hi Apurv, yes, those are pending test cases to be fixed. GraphJobRunner >>> is >>> > expecting the input in the format of Vertex, but we have input files as >>> > well as record key, values defined as Text. I have fixed only one unit >>> test >>> > case yet. >>> > >>> > On Tue, Jan 8, 2013 at 4:45 PM, Apurv Verma <[email protected]> wrote: >>> > >>> >> Hey all, >>> >> I got the problem, the partitioner was not being set for the >>> >> PartitionerRunner bsp task. :P I have fixed the partitioner with >>> portions >>> >> from your patch Suraj. Now after this commit partitioner will obey what >>> you >>> >> specified earlier, just to recapitulate. >>> >> >>> >> Repartitioning is done if : >>> >> - the number of splits found are not equal to the number of BSP tasks >>> >> configured for the job. OR >>> >> - the flag is set to true by the user >>> ("bsp.input.runtime.partitioning") OR >>> >> - user has specified a Runtime Partitioner class and enabled runtime >>> >> partitioning >>> >> >>> >> There was one special thing that I discovered about partitioner , just >>> >> sharing with you guys. Suppose I implement a partitioner which returns 0 >>> >> for a record, then it isn't necessary that this record will go to peer >>> with >>> >> index 0. It might go to peer 1. The only certitude which partitioner's >>> >> provide is that all records returning 0 will go to the same peer. I >>> needed >>> >> partitioner to work for PrefixSum I was implementing. >>> >> >>> >> Things to do next. >>> >> 1) RecordConverter , which Suraj is implementing in HAMA-700. (Please >>> >> update Suraj) >>> >> >>> >> B.T.W there are problems in mvn test. >>> >> *java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast >>> to >>> >> org.apache.hadoop.io.ArrayWritable* >>> >> * at >>> >> >>> org.apache.hama.graph.GraphJobRunner.loadVertices(GraphJobRunner.java:287)* >>> >> * >>> >> * >>> >> I don't think my commit is breaking this. >>> >> >>> >> Thanks >>> >> >>> >> >>> >> -- >>> >> Regards, >>> >> Apurv Verma >>> >> >>> >> >>> >> >>> >> >>> >> On Tue, Jan 8, 2013 at 11:07 PM, Suraj Menon <[email protected]> >>> >> wrote: >>> >> >>> >> > Please explain the nature of problems you are facing with Partitioner? >>> >> > >>> >> > >Any reasons for deciding to move the >>> >> > > PartitioningJob inside BSPJobClient from BSPJob? >>> >> > >>> >> > Twofold, BSPJob was just a configuration holder object, didn't want to >>> >> add >>> >> > the partitioning responsibility to the class. >>> >> > And also I wanted to know the number of splits, before taking the >>> >> decision >>> >> > whether to repartition or not. >>> >> > Repartitioning is done if : >>> >> > - the number of splits found are not equal to the number of BSP tasks >>> >> > configured for the job. OR >>> >> > - the flag is set to true by the user >>> ("bsp.input.runtime.partitioning") >>> >> OR >>> >> > - user has specified a Runtime Partitioner class and enabled runtime >>> >> > partitioning >>> >> > >>> >> > Thanks, >>> >> > Suraj >>> >> > >>> >> > On Tue, Jan 8, 2013 at 11:31 AM, Apurv Verma <[email protected]> >>> wrote: >>> >> > >>> >> > > Thanks, let me have a careful look at it. On a cursory look, I seem >>> to >>> >> > > understand the basic idea. Any reasons for deciding to move the >>> >> > > PartitioningJob inside BSPJobClient from BSPJob? >>> >> > > BTW the current partitioner doesn't work as intended, only the >>> default >>> >> > > partitioner HashPartitioner works fine, if I try to put some custom >>> >> > > partitioner there are problems. >>> >> > > >>> >> > > Let's resolve the partitioning completely before the spilling >>> message >>> >> > > queue. >>> >> > > >>> >> > > >>> >> > > -- >>> >> > > Regards, >>> >> > > Apurv Verma >>> >> > > >>> >> > > >>> >> > > >>> > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon
-- Best Regards, Edward J. Yoon @eddieyoon
