Re: Simple Archiver , Demux and PostProcessManager about the Raw Data sink file

Ariel Rabkin Fri, 16 Mar 2012 11:34:21 -0700

On Thu, Mar 15, 2012 at 12:36 AM, IvyTang <ivytang0...@gmail.com> wrote:
> As the wiki says, Data in the sink may include duplicate and omitted
> chunks.So we need demux and archive the raw data sink file .
>
> The start-data-processors.sh runs three processes ,  ChukwaArchiveManager
> , PostProcessorManager and DemuxManager.
>
> This
> page http://incubator.apache.org/chukwa/docs/r0.4.0/dataflow.html explains
> the data workflow.
>
> First , DemuxManager moves raw *.done to
>  dataSinkArchives/[yyyyMMdd]/*/*.done.
>
> Then, ChukwaArchiveManager every half hour or so aggregates and removes
> dataSinkArchives data using M/R , from dataSinkArchives/[yyyyMMdd]/*/*.done
> to finalArchives/.
>
> The complete logflow is  logs/*.done
> ==>  dataSinkArchives/[yyyyMMdd]/*/*.done ==> finalArchives
>
> 1.
>          Here , i have a question .Accoring to
> the http://incubator.apache.org/chukwa/docs/r0.4.0/programming.html#Using+MapReduce ,
>  Simple Archiver & Demux . The simple archiver removed the duplicates .
>         Does the simple archiver refers to the  ChukwaArchiveManager?


No, these are separate pieces. Back in the day, I found that
ChukwaArchiveManager was too complicated for my needs, and that I
wanted a simple command that would just archive whatever was in the
sink. And that's the simple archiver. It's found in
org.apache.hadoop.chukwa.extraction.archive.SinkArchiver.


> 3.     Can i just run the DemuxManager  & ChukwaArchiveManager ?  i found i
> just need these two components.

Yes, you should be fine with just those if they meet your needs.



-- 
Ari Rabkin asrab...@gmail.com
UC Berkeley Computer Science Department

Re: Simple Archiver , Demux and PostProcessManager about the Raw Data sink file

Reply via email to