Re: Sink file has omitted chunks?

Ariel Rabkin Mon, 22 Nov 2010 23:48:22 -0800

"Omitted chunks" is an error. By definition, if chunks are omitted
they won't be there. Duplicates and other peculiarities will happen in
the event of failures. As you say, it's a consequence of the
distributed environment.


SimpleArchiver should do the cleanup you want.

--Ari

On Mon, Nov 22, 2010 at 11:39 PM, Ying Tang <[email protected]> wrote:
> Hi all ,
>     After reading the chukwa docs , per my understanding , the log data flow
> is :
>     adaptor-->agent-->collector-->sink file--->....
>     In the doc says , " Data in the sink may include duplicate and omitted
> chunks."
>     And it is not recommanded to write MapReduce jobs that directly examine
> the data sink , "becaues  jobs will likely discard most of their input ".
>
>     Here is my question:
>     1. Why data in sink file include duplicate and ommitted chunks ? Because
> the distributed environmrnt ?
>     2. How to solve the problem above ?  The Simple Archiver generates the
> archive file , and duplicates have been removed . So the simple archiver can
> only solve the duplicate data , right?
>
> --
> Best regards,
> Ivy Tang
>
>
>



-- 
Ari Rabkin [email protected]
UC Berkeley Computer Science Department

Re: Sink file has omitted chunks?

Reply via email to