Re: Sort Accumulation

Bright Chen Tue, 14 Mar 2017 09:10:27 -0700

Hi Ajay,
My feeling is you assume handle all tuples in memory.
How to handle the case if memory is not enough to hold all tuples?


thanks
-Bright

On Mon, Mar 13, 2017 at 11:00 PM, AJAY GUPTA <[email protected]> wrote:

> Hi Apex Dev community,
>
> Kindly let me know if implementing this accumulation using TreeMultiSet is
> fine.
>
>
> Ajay
>
> On Thu, Mar 9, 2017 at 12:24 PM, AJAY GUPTA <[email protected]> wrote:
>
> > Hi Bright,
> >
> > I couldnot completely understand the bucketing approach you mentioned.
> How
> > would we bucket the data considering we have no idea what the data will
> be?
> >
> > How about using a TreeMultiSet?
> >
> >
> > Thanks,
> > Ajay
> >
> > On Wed, Mar 8, 2017 at 11:24 PM, Bright Chen <[email protected]>
> > wrote:
> >
> >> Hi Ajay,
> >> I think sort at getOutput() probably will get this method stuck due to
> >> very
> >> high volume of computation.
> >> And as we still need to persistent the data, it will not very helpful to
> >> increase the performance of processing tuple. Probably we can bucket the
> >> data with range of value. Such as following:
> >> - process tuple in one window: sort data of current window in memory
> >> - end window: merge the sorted memory data into buckets.
> >>
> >> thanks
> >> Bright
> >>
> >> On Wed, Mar 8, 2017 at 8:51 AM, AJAY GUPTA <[email protected]>
> wrote:
> >>
> >> > Hi Thomas,
> >> >
> >> > I looked at TopN. The accumulate() of TopN is an O(n*k). Using similar
> >> > approach for Sort will lead to an O(n^2) complexity.
> >> > Since we have to sort all elements, we can do it in a single sort call
> >> in
> >> > getOutput().
> >> >
> >> >
> >> > On Wed, Mar 8, 2017 at 10:09 PM, Thomas Weise <[email protected]> wrote:
> >> >
> >> > > Look at the existing topN accumulation. It should be a
> generalization,
> >> > > where you don't have a limit.
> >> > >
> >> > >
> >> > > On Wed, Mar 8, 2017 at 8:05 AM, AJAY GUPTA <[email protected]>
> >> wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > I would like to propose the Sort Accumulation. The accumulation
> >> will be
> >> > > > responsible for sorting the input POJO stream. The accumulation
> will
> >> > > > require a comparator to compare and sort the input tuples. Another
> >> > > boolean
> >> > > > parameter "sortDesc" will be used to decide sorting order.
> >> > > >
> >> > > > Let me know your views.
> >> > > >
> >> > > > Thanks,
> >> > > > Ajay
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Sort Accumulation

Reply via email to