Hi Ajay, My feeling is you assume handle all tuples in memory. How to handle the case if memory is not enough to hold all tuples?
thanks -Bright On Mon, Mar 13, 2017 at 11:00 PM, AJAY GUPTA <ajaygit...@gmail.com> wrote: > Hi Apex Dev community, > > Kindly let me know if implementing this accumulation using TreeMultiSet is > fine. > > > Ajay > > On Thu, Mar 9, 2017 at 12:24 PM, AJAY GUPTA <ajaygit...@gmail.com> wrote: > > > Hi Bright, > > > > I couldnot completely understand the bucketing approach you mentioned. > How > > would we bucket the data considering we have no idea what the data will > be? > > > > How about using a TreeMultiSet? > > > > > > Thanks, > > Ajay > > > > On Wed, Mar 8, 2017 at 11:24 PM, Bright Chen <bri...@datatorrent.com> > > wrote: > > > >> Hi Ajay, > >> I think sort at getOutput() probably will get this method stuck due to > >> very > >> high volume of computation. > >> And as we still need to persistent the data, it will not very helpful to > >> increase the performance of processing tuple. Probably we can bucket the > >> data with range of value. Such as following: > >> - process tuple in one window: sort data of current window in memory > >> - end window: merge the sorted memory data into buckets. > >> > >> thanks > >> Bright > >> > >> On Wed, Mar 8, 2017 at 8:51 AM, AJAY GUPTA <ajaygit...@gmail.com> > wrote: > >> > >> > Hi Thomas, > >> > > >> > I looked at TopN. The accumulate() of TopN is an O(n*k). Using similar > >> > approach for Sort will lead to an O(n^2) complexity. > >> > Since we have to sort all elements, we can do it in a single sort call > >> in > >> > getOutput(). > >> > > >> > > >> > On Wed, Mar 8, 2017 at 10:09 PM, Thomas Weise <t...@apache.org> wrote: > >> > > >> > > Look at the existing topN accumulation. It should be a > generalization, > >> > > where you don't have a limit. > >> > > > >> > > > >> > > On Wed, Mar 8, 2017 at 8:05 AM, AJAY GUPTA <ajaygit...@gmail.com> > >> wrote: > >> > > > >> > > > Hi, > >> > > > > >> > > > I would like to propose the Sort Accumulation. The accumulation > >> will be > >> > > > responsible for sorting the input POJO stream. The accumulation > will > >> > > > require a comparator to compare and sort the input tuples. Another > >> > > boolean > >> > > > parameter "sortDesc" will be used to decide sorting order. > >> > > > > >> > > > Let me know your views. > >> > > > > >> > > > Thanks, > >> > > > Ajay > >> > > > > >> > > > >> > > >> > > > > >