Re: Partitioner in Hama

Edward J. Yoon Tue, 08 Jan 2013 15:40:21 -0800

Keeping run-time (network-based) partitioning within GraphJobRunner is
not good idea.


>> - the number of splits found are not equal to the number of BSP tasks
>> configured for the job. OR

I have a question. If the input is unsorted map and I want to
re-partition by hashing but the numbers of blocks and desired tasks
are same, then what happens? Do you mean run-time partitioning?

On Wed, Jan 9, 2013 at 7:07 AM, Suraj Menon <[email protected]> wrote:
> Hi Apurv, yes, those are pending test cases to be fixed. GraphJobRunner is
> expecting the input in the format of Vertex, but we have input files as
> well as record key, values defined as Text. I have fixed only one unit test
> case yet.
>
> On Tue, Jan 8, 2013 at 4:45 PM, Apurv Verma <[email protected]> wrote:
>
>> Hey all,
>>  I got the problem, the partitioner was not being set for the
>> PartitionerRunner bsp task. :P I have fixed the partitioner with portions
>> from your patch Suraj. Now after this commit partitioner will obey what you
>> specified earlier, just to recapitulate.
>>
>> Repartitioning is done if :
>> - the number of splits found are not equal to the number of BSP tasks
>> configured for the job. OR
>> - the flag is set to true by the user ("bsp.input.runtime.partitioning") OR
>> - user has specified a Runtime Partitioner class and enabled runtime
>> partitioning
>>
>> There was one special thing that I discovered about partitioner , just
>> sharing with you guys. Suppose I implement a partitioner which returns 0
>> for a record, then it isn't necessary that this record will go to peer with
>> index 0. It might go to peer 1. The only certitude which partitioner's
>> provide is that all records returning 0 will go to the same peer. I needed
>> partitioner to work for PrefixSum I was implementing.
>>
>> Things to do next.
>> 1) RecordConverter , which Suraj is implementing in HAMA-700. (Please
>> update Suraj)
>>
>> B.T.W there are problems in mvn test.
>> *java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
>> org.apache.hadoop.io.ArrayWritable*
>> * at
>> org.apache.hama.graph.GraphJobRunner.loadVertices(GraphJobRunner.java:287)*
>> *
>> *
>> I don't think my commit is breaking this.
>>
>> Thanks
>>
>>
>> --
>> Regards,
>> Apurv Verma
>>
>>
>>
>>
>> On Tue, Jan 8, 2013 at 11:07 PM, Suraj Menon <[email protected]>
>> wrote:
>>
>> > Please explain the nature of problems you are facing with Partitioner?
>> >
>> > >Any reasons for deciding to move the
>> > > PartitioningJob inside BSPJobClient from BSPJob?
>> >
>> > Twofold, BSPJob was just a configuration holder object, didn't want to
>> add
>> > the partitioning responsibility to the class.
>> > And also I wanted to know the number of splits, before taking the
>> decision
>> > whether to repartition or not.
>> > Repartitioning is done if :
>> > - the number of splits found are not equal to the number of BSP tasks
>> > configured for the job. OR
>> > - the flag is set to true by the user ("bsp.input.runtime.partitioning")
>> OR
>> > - user has specified a Runtime Partitioner class and enabled runtime
>> > partitioning
>> >
>> > Thanks,
>> > Suraj
>> >
>> > On Tue, Jan 8, 2013 at 11:31 AM, Apurv Verma <[email protected]> wrote:
>> >
>> > > Thanks, let me have a careful look at it. On a cursory look, I seem to
>> > > understand the basic idea. Any reasons for deciding to move the
>> > > PartitioningJob inside BSPJobClient from BSPJob?
>> > > BTW the current partitioner doesn't work as intended, only the default
>> > > partitioner HashPartitioner works fine, if I try to put some custom
>> > > partitioner there are problems.
>> > >
>> > > Let's resolve the partitioning completely before the spilling message
>> > > queue.
>> > >
>> > >
>> > > --
>> > > Regards,
>> > > Apurv Verma
>> > >
>> > >
>> > >
>> > >
>> > > On Tue, Jan 8, 2013 at 8:39 PM, Suraj Menon <[email protected]>
>> > > wrote:
>> > >
>> > > > Hey Apurv, please check HAMA-700.patch_Jan7. Feel free to provide
>> > > > suggestions or even work on it.
>> > > >
>> > > > Thanks,
>> > > > Suraj
>> > > >
>> > > > On Tue, Jan 8, 2013 at 9:21 AM, Apurv Verma <[email protected]>
>> wrote:
>> > > >
>> > > > > Hey Edward,
>> > > > >  There was a compile bug which i fixed temporarily. isPartitioned
>> was
>> > > not
>> > > > > being initialized. Could you please check the last commit. I have
>> > > > currently
>> > > > > initialized it to false but I guess this should be configurable.
>> > > > > There was some jira where we wanted partitioning to be skipped if
>> > user
>> > > > > thinks his data is already partitioned.
>> > > > >
>> > > > > Thanks again.
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Regards,
>> > > > > Apurv Verma
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Tue, Jan 8, 2013 at 3:44 PM, Edward J. Yoon <
>> > [email protected]
>> > > > > >wrote:
>> > > > >
>> > > > > > Thanks, then I'll finish tomorrow. Please feel free to comment
>> > there.
>> > > > > >
>> > > > > > On Tue, Jan 8, 2013 at 7:04 PM, Tommaso Teofili
>> > > > > > <[email protected]> wrote:
>> > > > > > > thanks Edward, it looks good.
>> > > > > > > Tommaso
>> > > > > > >
>> > > > > > >
>> > > > > > > 2013/1/8 Edward J. Yoon <[email protected]>
>> > > > > > >
>> > > > > > >> Please review this:
>> > > > > > >>
>> > > > > > >> http://wiki.apache.org/hama/Partitioning
>> > > > > > >>
>> > > > > > >> On Mon, Jan 7, 2013 at 6:17 AM, Edward J. Yoon <
>> > > > [email protected]
>> > > > > >
>> > > > > > >> wrote:
>> > > > > > >> > I mean, the pre-partitioning or resizing partitions is
>> really
>> > > > > > important.
>> > > > > > >> >
>> > > > > > >> > On Mon, Jan 7, 2013 at 6:15 AM, Edward J. Yoon <
>> > > > > [email protected]
>> > > > > > >
>> > > > > > >> wrote:
>> > > > > > >> >> This is another talk ...
>> > > > > > >> >>
>> > > > > > >> >> Unlike MapReduce, I think, Hama BSP will handle tasks that
>> > > input
>> > > > is
>> > > > > > >> >> small in size but large in computational complexity, such
>> as
>> > > > graph,
>> > > > > > >> >> sparse matrix, machine learning algorithms.
>> > > > > > >> >>
>> > > > > > >> >> On Mon, Jan 7, 2013 at 5:57 AM, Edward J. Yoon <
>> > > > > > [email protected]>
>> > > > > > >> wrote:
>> > > > > > >> >>> Even though the numbers of splits and tasks are the same,
>> > > > > > user-defined
>> > > > > > >> >>> partitioning job should be run (because it is not only for
>> > > > > resizing
>> > > > > > >> >>> partitions. For example, range partitioning of unsorted
>> data
>> > > set
>> > > > > or
>> > > > > > >> >>> hash key partitioning, ..., etc).
>> > > > > > >> >>>
>> > > > > > >> >>> On Mon, Jan 7, 2013 at 5:28 AM, Suraj Menon <
>> > > > > [email protected]
>> > > > > > >
>> > > > > > >> wrote:
>> > > > > > >> >>>>>    1. I am referring to
>> > > > org.apache.hama.bsp.PartitioningRunner,
>> > > > > > it's
>> > > > > > >> named
>> > > > > > >> >>>>>    as so in the HEAD (1429573) of trunk. It isn't
>> removed
>> > > but
>> > > > it
>> > > > > > >> isn't
>> > > > > > >> >>>>>    referred to anywhere else. I can't find any
>> references
>> > to
>> > > > it
>> > > > > in
>> > > > > > >> the
>> > > > > > >> >>>>>    workspace.
>> > > > > > >> >>>>>
>> > > > > > >> >>>>
>> > > > > > >> >>>> It is referred in BSPJob#waitForCompletion function as a
>> > > > separate
>> > > > > > BSP
>> > > > > > >> job
>> > > > > > >> >>>> to create the specified splits.
>> > > > > > >> >>>>
>> > > > > > >> >>>>
>> > > > > > >> >>>>>    2. job.setPartitioner is the same as setting
>> > > > > > >> >>>>>    "bsp.input.partitioner.class" . Anyways , So acc. to
>> me
>> > > > > > >> partitions are
>> > > > > > >> >>>>> not
>> > > > > > >> >>>>>    being created because of which the following happens.
>> > > > > > >> >>>>>    If I am running the task on local fs and not hdfs,
>> > > there's
>> > > > > just
>> > > > > > >> one
>> > > > > > >> >>>>>    input split and even if I set a partitioner to create
>> > two
>> > > > > > >> partitions and
>> > > > > > >> >>>>>    set bsp.setNumTasks(2) , this is overriden and only
>> one
>> > > > task
>> > > > > is
>> > > > > > >> >>>>> executed.
>> > > > > > >> >>>>>    See BSPJobClient#submitJobInternal()
>> > > > > > >> >>>>>    where it does the following
>> > > > > > >> >>>>>    job.setNumBspTask(writeSplits(job, submitSplitFile,
>> > > > > maxTasks));
>> > > > > > >> Line
>> > > > > > >> >>>>>    326.
>> > > > > > >> >>>>>
>> > > > > > >> >>>>> This job is set to run if the number of splits != number
>> > of
>> > > > > Tasks
>> > > > > > or
>> > > > > > >> if
>> > > > > > >> >>>> forced by the configuration. I can share my HAMA-700
>> > current
>> > > > > state
>> > > > > > of
>> > > > > > >> patch
>> > > > > > >> >>>> with you.
>> > > > > > >> >>>>
>> > > > > > >> >>>>
>> > > > > > >> >>>>>    3. So here is what I think is happening, Partitioner
>> is
>> > > not
>> > > > > in
>> > > > > > the
>> > > > > > >> >>>>>    codepath (try putting a breakpoint inside the
>> > partitioner
>> > > > and
>> > > > > > >> executing
>> > > > > > >> >>>>> and
>> > > > > > >> >>>>>    non graph bsp task), so partitions are not being
>> > created
>> > > > and
>> > > > > > >> >>>>> writeSplits()
>> > > > > > >> >>>>>    is returning 1.
>> > > > > > >> >>>>>    [ writeSplits() returns the number of splits in the
>> > > input.
>> > > > ]
>> > > > > > >> >>>>>
>> > > > > > >> >>>>
>> > > > > > >> >>>> Probably because it is running as a separate process?
>> > > > > > >> >>>
>> > > > > > >> >>>
>> > > > > > >> >>>
>> > > > > > >> >>> --
>> > > > > > >> >>> Best Regards, Edward J. Yoon
>> > > > > > >> >>> @eddieyoon
>> > > > > > >> >>
>> > > > > > >> >>
>> > > > > > >> >>
>> > > > > > >> >> --
>> > > > > > >> >> Best Regards, Edward J. Yoon
>> > > > > > >> >> @eddieyoon
>> > > > > > >> >
>> > > > > > >> >
>> > > > > > >> >
>> > > > > > >> > --
>> > > > > > >> > Best Regards, Edward J. Yoon
>> > > > > > >> > @eddieyoon
>> > > > > > >>
>> > > > > > >>
>> > > > > > >>
>> > > > > > >> --
>> > > > > > >> Best Regards, Edward J. Yoon
>> > > > > > >> @eddieyoon
>> > > > > > >>
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Best Regards, Edward J. Yoon
>> > > > > > @eddieyoon
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Partitioner in Hama

Reply via email to