Hi Bright, I couldnot completely understand the bucketing approach you mentioned. How would we bucket the data considering we have no idea what the data will be?
How about using a TreeMultiSet? Thanks, Ajay On Wed, Mar 8, 2017 at 11:24 PM, Bright Chen <bri...@datatorrent.com> wrote: > Hi Ajay, > I think sort at getOutput() probably will get this method stuck due to very > high volume of computation. > And as we still need to persistent the data, it will not very helpful to > increase the performance of processing tuple. Probably we can bucket the > data with range of value. Such as following: > - process tuple in one window: sort data of current window in memory > - end window: merge the sorted memory data into buckets. > > thanks > Bright > > On Wed, Mar 8, 2017 at 8:51 AM, AJAY GUPTA <ajaygit...@gmail.com> wrote: > > > Hi Thomas, > > > > I looked at TopN. The accumulate() of TopN is an O(n*k). Using similar > > approach for Sort will lead to an O(n^2) complexity. > > Since we have to sort all elements, we can do it in a single sort call in > > getOutput(). > > > > > > On Wed, Mar 8, 2017 at 10:09 PM, Thomas Weise <t...@apache.org> wrote: > > > > > Look at the existing topN accumulation. It should be a generalization, > > > where you don't have a limit. > > > > > > > > > On Wed, Mar 8, 2017 at 8:05 AM, AJAY GUPTA <ajaygit...@gmail.com> > wrote: > > > > > > > Hi, > > > > > > > > I would like to propose the Sort Accumulation. The accumulation will > be > > > > responsible for sorting the input POJO stream. The accumulation will > > > > require a comparator to compare and sort the input tuples. Another > > > boolean > > > > parameter "sortDesc" will be used to decide sorting order. > > > > > > > > Let me know your views. > > > > > > > > Thanks, > > > > Ajay > > > > > > > > > >