Howdy, Chukwa does duplicate detection as follows: Each Chunk of data comes with a stream name (such as the name of a log file) and a sequence ID. If two chunks have the same name and ID, they're duplicate. The content isn't inspected.
So in your example, the former will be treated as a duplicate, not the latter. --Ari On Thu, Mar 18, 2010 at 8:59 AM, Corbin Hoenes <cor...@tynt.com> wrote: > Does anyone have more information about how chukwa removes duplicates during > demux? How does it decide what is a duplicate? There are two cases I am > thinking of... > > 1 - we send the same log file to chukwa 2x > 2 - we have the exact same line in a log file 2x -- Ari Rabkin asrab...@gmail.com UC Berkeley Computer Science Department