Please explain the nature of problems you are facing with Partitioner?

>Any reasons for deciding to move the
> PartitioningJob inside BSPJobClient from BSPJob?

Twofold, BSPJob was just a configuration holder object, didn't want to add
the partitioning responsibility to the class.
And also I wanted to know the number of splits, before taking the decision
whether to repartition or not.
Repartitioning is done if :
- the number of splits found are not equal to the number of BSP tasks
configured for the job. OR
- the flag is set to true by the user ("bsp.input.runtime.partitioning") OR
- user has specified a Runtime Partitioner class and enabled runtime
partitioning

Thanks,
Suraj

On Tue, Jan 8, 2013 at 11:31 AM, Apurv Verma <[email protected]> wrote:

> Thanks, let me have a careful look at it. On a cursory look, I seem to
> understand the basic idea. Any reasons for deciding to move the
> PartitioningJob inside BSPJobClient from BSPJob?
> BTW the current partitioner doesn't work as intended, only the default
> partitioner HashPartitioner works fine, if I try to put some custom
> partitioner there are problems.
>
> Let's resolve the partitioning completely before the spilling message
> queue.
>
>
> --
> Regards,
> Apurv Verma
>
>
>
>
> On Tue, Jan 8, 2013 at 8:39 PM, Suraj Menon <[email protected]>
> wrote:
>
> > Hey Apurv, please check HAMA-700.patch_Jan7. Feel free to provide
> > suggestions or even work on it.
> >
> > Thanks,
> > Suraj
> >
> > On Tue, Jan 8, 2013 at 9:21 AM, Apurv Verma <[email protected]> wrote:
> >
> > > Hey Edward,
> > >  There was a compile bug which i fixed temporarily. isPartitioned was
> not
> > > being initialized. Could you please check the last commit. I have
> > currently
> > > initialized it to false but I guess this should be configurable.
> > > There was some jira where we wanted partitioning to be skipped if user
> > > thinks his data is already partitioned.
> > >
> > > Thanks again.
> > >
> > >
> > > --
> > > Regards,
> > > Apurv Verma
> > >
> > >
> > >
> > >
> > > On Tue, Jan 8, 2013 at 3:44 PM, Edward J. Yoon <[email protected]
> > > >wrote:
> > >
> > > > Thanks, then I'll finish tomorrow. Please feel free to comment there.
> > > >
> > > > On Tue, Jan 8, 2013 at 7:04 PM, Tommaso Teofili
> > > > <[email protected]> wrote:
> > > > > thanks Edward, it looks good.
> > > > > Tommaso
> > > > >
> > > > >
> > > > > 2013/1/8 Edward J. Yoon <[email protected]>
> > > > >
> > > > >> Please review this:
> > > > >>
> > > > >> http://wiki.apache.org/hama/Partitioning
> > > > >>
> > > > >> On Mon, Jan 7, 2013 at 6:17 AM, Edward J. Yoon <
> > [email protected]
> > > >
> > > > >> wrote:
> > > > >> > I mean, the pre-partitioning or resizing partitions is really
> > > > important.
> > > > >> >
> > > > >> > On Mon, Jan 7, 2013 at 6:15 AM, Edward J. Yoon <
> > > [email protected]
> > > > >
> > > > >> wrote:
> > > > >> >> This is another talk ...
> > > > >> >>
> > > > >> >> Unlike MapReduce, I think, Hama BSP will handle tasks that
> input
> > is
> > > > >> >> small in size but large in computational complexity, such as
> > graph,
> > > > >> >> sparse matrix, machine learning algorithms.
> > > > >> >>
> > > > >> >> On Mon, Jan 7, 2013 at 5:57 AM, Edward J. Yoon <
> > > > [email protected]>
> > > > >> wrote:
> > > > >> >>> Even though the numbers of splits and tasks are the same,
> > > > user-defined
> > > > >> >>> partitioning job should be run (because it is not only for
> > > resizing
> > > > >> >>> partitions. For example, range partitioning of unsorted data
> set
> > > or
> > > > >> >>> hash key partitioning, ..., etc).
> > > > >> >>>
> > > > >> >>> On Mon, Jan 7, 2013 at 5:28 AM, Suraj Menon <
> > > [email protected]
> > > > >
> > > > >> wrote:
> > > > >> >>>>>    1. I am referring to
> > org.apache.hama.bsp.PartitioningRunner,
> > > > it's
> > > > >> named
> > > > >> >>>>>    as so in the HEAD (1429573) of trunk. It isn't removed
> but
> > it
> > > > >> isn't
> > > > >> >>>>>    referred to anywhere else. I can't find any references to
> > it
> > > in
> > > > >> the
> > > > >> >>>>>    workspace.
> > > > >> >>>>>
> > > > >> >>>>
> > > > >> >>>> It is referred in BSPJob#waitForCompletion function as a
> > separate
> > > > BSP
> > > > >> job
> > > > >> >>>> to create the specified splits.
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>>>    2. job.setPartitioner is the same as setting
> > > > >> >>>>>    "bsp.input.partitioner.class" . Anyways , So acc. to me
> > > > >> partitions are
> > > > >> >>>>> not
> > > > >> >>>>>    being created because of which the following happens.
> > > > >> >>>>>    If I am running the task on local fs and not hdfs,
> there's
> > > just
> > > > >> one
> > > > >> >>>>>    input split and even if I set a partitioner to create two
> > > > >> partitions and
> > > > >> >>>>>    set bsp.setNumTasks(2) , this is overriden and only one
> > task
> > > is
> > > > >> >>>>> executed.
> > > > >> >>>>>    See BSPJobClient#submitJobInternal()
> > > > >> >>>>>    where it does the following
> > > > >> >>>>>    job.setNumBspTask(writeSplits(job, submitSplitFile,
> > > maxTasks));
> > > > >> Line
> > > > >> >>>>>    326.
> > > > >> >>>>>
> > > > >> >>>>> This job is set to run if the number of splits != number of
> > > Tasks
> > > > or
> > > > >> if
> > > > >> >>>> forced by the configuration. I can share my HAMA-700 current
> > > state
> > > > of
> > > > >> patch
> > > > >> >>>> with you.
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>>>    3. So here is what I think is happening, Partitioner is
> not
> > > in
> > > > the
> > > > >> >>>>>    codepath (try putting a breakpoint inside the partitioner
> > and
> > > > >> executing
> > > > >> >>>>> and
> > > > >> >>>>>    non graph bsp task), so partitions are not being created
> > and
> > > > >> >>>>> writeSplits()
> > > > >> >>>>>    is returning 1.
> > > > >> >>>>>    [ writeSplits() returns the number of splits in the
> input.
> > ]
> > > > >> >>>>>
> > > > >> >>>>
> > > > >> >>>> Probably because it is running as a separate process?
> > > > >> >>>
> > > > >> >>>
> > > > >> >>>
> > > > >> >>> --
> > > > >> >>> Best Regards, Edward J. Yoon
> > > > >> >>> @eddieyoon
> > > > >> >>
> > > > >> >>
> > > > >> >>
> > > > >> >> --
> > > > >> >> Best Regards, Edward J. Yoon
> > > > >> >> @eddieyoon
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > Best Regards, Edward J. Yoon
> > > > >> > @eddieyoon
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Best Regards, Edward J. Yoon
> > > > >> @eddieyoon
> > > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Edward J. Yoon
> > > > @eddieyoon
> > > >
> > >
> >
>

Reply via email to