Hey Apurv, please check HAMA-700.patch_Jan7. Feel free to provide
suggestions or even work on it.

Thanks,
Suraj

On Tue, Jan 8, 2013 at 9:21 AM, Apurv Verma <[email protected]> wrote:

> Hey Edward,
>  There was a compile bug which i fixed temporarily. isPartitioned was not
> being initialized. Could you please check the last commit. I have currently
> initialized it to false but I guess this should be configurable.
> There was some jira where we wanted partitioning to be skipped if user
> thinks his data is already partitioned.
>
> Thanks again.
>
>
> --
> Regards,
> Apurv Verma
>
>
>
>
> On Tue, Jan 8, 2013 at 3:44 PM, Edward J. Yoon <[email protected]
> >wrote:
>
> > Thanks, then I'll finish tomorrow. Please feel free to comment there.
> >
> > On Tue, Jan 8, 2013 at 7:04 PM, Tommaso Teofili
> > <[email protected]> wrote:
> > > thanks Edward, it looks good.
> > > Tommaso
> > >
> > >
> > > 2013/1/8 Edward J. Yoon <[email protected]>
> > >
> > >> Please review this:
> > >>
> > >> http://wiki.apache.org/hama/Partitioning
> > >>
> > >> On Mon, Jan 7, 2013 at 6:17 AM, Edward J. Yoon <[email protected]
> >
> > >> wrote:
> > >> > I mean, the pre-partitioning or resizing partitions is really
> > important.
> > >> >
> > >> > On Mon, Jan 7, 2013 at 6:15 AM, Edward J. Yoon <
> [email protected]
> > >
> > >> wrote:
> > >> >> This is another talk ...
> > >> >>
> > >> >> Unlike MapReduce, I think, Hama BSP will handle tasks that input is
> > >> >> small in size but large in computational complexity, such as graph,
> > >> >> sparse matrix, machine learning algorithms.
> > >> >>
> > >> >> On Mon, Jan 7, 2013 at 5:57 AM, Edward J. Yoon <
> > [email protected]>
> > >> wrote:
> > >> >>> Even though the numbers of splits and tasks are the same,
> > user-defined
> > >> >>> partitioning job should be run (because it is not only for
> resizing
> > >> >>> partitions. For example, range partitioning of unsorted data set
> or
> > >> >>> hash key partitioning, ..., etc).
> > >> >>>
> > >> >>> On Mon, Jan 7, 2013 at 5:28 AM, Suraj Menon <
> [email protected]
> > >
> > >> wrote:
> > >> >>>>>    1. I am referring to org.apache.hama.bsp.PartitioningRunner,
> > it's
> > >> named
> > >> >>>>>    as so in the HEAD (1429573) of trunk. It isn't removed but it
> > >> isn't
> > >> >>>>>    referred to anywhere else. I can't find any references to it
> in
> > >> the
> > >> >>>>>    workspace.
> > >> >>>>>
> > >> >>>>
> > >> >>>> It is referred in BSPJob#waitForCompletion function as a separate
> > BSP
> > >> job
> > >> >>>> to create the specified splits.
> > >> >>>>
> > >> >>>>
> > >> >>>>>    2. job.setPartitioner is the same as setting
> > >> >>>>>    "bsp.input.partitioner.class" . Anyways , So acc. to me
> > >> partitions are
> > >> >>>>> not
> > >> >>>>>    being created because of which the following happens.
> > >> >>>>>    If I am running the task on local fs and not hdfs, there's
> just
> > >> one
> > >> >>>>>    input split and even if I set a partitioner to create two
> > >> partitions and
> > >> >>>>>    set bsp.setNumTasks(2) , this is overriden and only one task
> is
> > >> >>>>> executed.
> > >> >>>>>    See BSPJobClient#submitJobInternal()
> > >> >>>>>    where it does the following
> > >> >>>>>    job.setNumBspTask(writeSplits(job, submitSplitFile,
> maxTasks));
> > >> Line
> > >> >>>>>    326.
> > >> >>>>>
> > >> >>>>> This job is set to run if the number of splits != number of
> Tasks
> > or
> > >> if
> > >> >>>> forced by the configuration. I can share my HAMA-700 current
> state
> > of
> > >> patch
> > >> >>>> with you.
> > >> >>>>
> > >> >>>>
> > >> >>>>>    3. So here is what I think is happening, Partitioner is not
> in
> > the
> > >> >>>>>    codepath (try putting a breakpoint inside the partitioner and
> > >> executing
> > >> >>>>> and
> > >> >>>>>    non graph bsp task), so partitions are not being created and
> > >> >>>>> writeSplits()
> > >> >>>>>    is returning 1.
> > >> >>>>>    [ writeSplits() returns the number of splits in the input. ]
> > >> >>>>>
> > >> >>>>
> > >> >>>> Probably because it is running as a separate process?
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>> --
> > >> >>> Best Regards, Edward J. Yoon
> > >> >>> @eddieyoon
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> Best Regards, Edward J. Yoon
> > >> >> @eddieyoon
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Best Regards, Edward J. Yoon
> > >> > @eddieyoon
> > >>
> > >>
> > >>
> > >> --
> > >> Best Regards, Edward J. Yoon
> > >> @eddieyoon
> > >>
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>

Reply via email to