Sink file has omitted chunks?

Ying Tang Mon, 22 Nov 2010 23:40:16 -0800

Hi all ,
    After reading the chukwa docs , per my understanding , the log data flow
is :
    adaptor-->agent-->collector-->sink file--->....
    In the doc says , "* **Data in the sink may include duplicate and
omitted chunks*."
    And it is not recommanded to write MapReduce jobs that directly examine
the data sink , "*becaues ** jobs will likely discard most of their input*".


    Here is my question:
    1. Why data in sink file include duplicate and ommitted chunks ? Because
the distributed environmrnt ?
    2. How to solve the problem above ?  The Simple Archiver generates the
archive file , and duplicates have been removed . So the simple archiver can
only solve the duplicate data , right?

--
Best regards,

Ivy Tang

Sink file has omitted chunks?

Reply via email to