So in scenario the stream name should be the same but how do sequence IDs get generated? If I tried to tail the same log file 24 hours after doing it the first time would they have the same seq id?
On Mar 18, 2010, at 11:24 AM, Ariel Rabkin wrote: > Howdy, > > Chukwa does duplicate detection as follows: Each Chunk of data comes > with a stream name (such as the name of a log file) and a sequence ID. > If two chunks have the same name and ID, they're duplicate. The > content isn't inspected. > > So in your example, the former will be treated as a duplicate, not the latter. > > --Ari > > On Thu, Mar 18, 2010 at 8:59 AM, Corbin Hoenes <cor...@tynt.com> wrote: >> Does anyone have more information about how chukwa removes duplicates during >> demux? How does it decide what is a duplicate? There are two cases I am >> thinking of... >> >> 1 - we send the same log file to chukwa 2x >> 2 - we have the exact same line in a log file 2x > > > > -- > Ari Rabkin asrab...@gmail.com > UC Berkeley Computer Science Department