Re: Sort Accumulation

AJAY GUPTA Wed, 08 Mar 2017 22:55:01 -0800

Hi Bright,

I couldnot completely understand the bucketing approach you mentioned. How
would we bucket the data considering we have no idea what the data will be?


How about using a TreeMultiSet?


Thanks,
Ajay

On Wed, Mar 8, 2017 at 11:24 PM, Bright Chen <bri...@datatorrent.com> wrote:

> Hi Ajay,
> I think sort at getOutput() probably will get this method stuck due to very
> high volume of computation.
> And as we still need to persistent the data, it will not very helpful to
> increase the performance of processing tuple. Probably we can bucket the
> data with range of value. Such as following:
> - process tuple in one window: sort data of current window in memory
> - end window: merge the sorted memory data into buckets.
>
> thanks
> Bright
>
> On Wed, Mar 8, 2017 at 8:51 AM, AJAY GUPTA <ajaygit...@gmail.com> wrote:
>
> > Hi Thomas,
> >
> > I looked at TopN. The accumulate() of TopN is an O(n*k). Using similar
> > approach for Sort will lead to an O(n^2) complexity.
> > Since we have to sort all elements, we can do it in a single sort call in
> > getOutput().
> >
> >
> > On Wed, Mar 8, 2017 at 10:09 PM, Thomas Weise <t...@apache.org> wrote:
> >
> > > Look at the existing topN accumulation. It should be a generalization,
> > > where you don't have a limit.
> > >
> > >
> > > On Wed, Mar 8, 2017 at 8:05 AM, AJAY GUPTA <ajaygit...@gmail.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I would like to propose the Sort Accumulation. The accumulation will
> be
> > > > responsible for sorting the input POJO stream. The accumulation will
> > > > require a comparator to compare and sort the input tuples. Another
> > > boolean
> > > > parameter "sortDesc" will be used to decide sorting order.
> > > >
> > > > Let me know your views.
> > > >
> > > > Thanks,
> > > > Ajay
> > > >
> > >
> >
>

Re: Sort Accumulation

Reply via email to