Howdy.

There were a couple reasons to archive the pre-demux data, rather than
the parsed records.
- The pre-demux format is slightly more compact.
- For most text-based logs, the demux processing doesn't really add
anything, and it adds another layer that can corrupt or lose data.
- If you want to change how you generate records, you want to have the
raw data available.

2010/3/9 Guillermo Pérez <bi...@tuenti.com>:
> I was wondering why we keep the original files processed in chukwa in
> the finalArchives folder. I want to generate chukwa records for doing
> pig reports on them. I would be interested in generating an archive of
> chukwa records, but right now chukwa seems to generate an archive of
> the original files. There is any reason for doing this? I would rather
> just delete files in the dataSink, after having them loaded as
> records. I'm just curious on the rationale of doing this.
>
> Thanks a lot!
>
> --
> Guille -ℬḭṩḩø- <bi...@tuenti.com>
> :wq
>



-- 
Ari Rabkin asrab...@gmail.com
UC Berkeley Computer Science Department

Reply via email to