Re: duplicate data

Ariel Rabkin Thu, 18 Mar 2010 10:24:56 -0700

Howdy,

Chukwa does duplicate detection as follows: Each Chunk of data comes
with a stream name (such as the name of a log file) and a sequence ID.
If two chunks have the same name and ID, they're duplicate.  The
content isn't inspected.


So in your example, the former will be treated as a duplicate, not the latter.

--Ari

On Thu, Mar 18, 2010 at 8:59 AM, Corbin Hoenes <cor...@tynt.com> wrote:
> Does anyone have more information about how chukwa removes duplicates during 
> demux? How does it decide what is a duplicate?  There are two cases I am 
> thinking of...
>
> 1 - we send the same log file to chukwa 2x
> 2 - we have the exact same line in a log file 2x



-- 
Ari Rabkin asrab...@gmail.com
UC Berkeley Computer Science Department

Re: duplicate data

Reply via email to