pi song
Mon, 07 Apr 2008 16:05:48 -0700
Another beautiful mapping again. Nagle's algorithm !!!! Ted, having to process more can be worthwhile as far as there are real needing applications. I'm also looking forward to see "use cases" (I believe there will be many). On 4/8/08, Ted Dunning <[EMAIL PROTECTED]> wrote: > > > > Sliding windows are good for some things, but often involve lots of > repeated > work if the amount the window slides is small compared with the window > width. > > Processing small batches of input into a summary form that can be applied > to > a summary of other small batches can avoid this repeated work. > > If you have a usable summary form, this works well. If you don't, there > is > a threshold by batch size where one approach or the other will be > preferable. Having many small batches will eventually cause performance > degradation so severe that processing the entire window will be faster. > There are hybrid solutions as well where most of the window is grouped > into > a large batch and the new data is merged. This requires aging out old > data > which can get kind of tricky. > > > On 4/7/08 8:02 AM, "pi song" <[EMAIL PROTECTED]> wrote: > > > 2. From Casper "Logfiles from S3 is already delayed apx. 2 hours. so I > > really have no pressure.", this reminds me about stream processing > again. I > > used to say stream processing is real-time but MapReduce is batch. Now > I've > > just recognized that we don't have to be strictly real-time. If say we > do > > process using sliding windows every 2 hours, this way we still can apply > > some stream concepts to real-world applications. > >