Howdy. There were a couple reasons to archive the pre-demux data, rather than the parsed records. - The pre-demux format is slightly more compact. - For most text-based logs, the demux processing doesn't really add anything, and it adds another layer that can corrupt or lose data. - If you want to change how you generate records, you want to have the raw data available.
2010/3/9 Guillermo Pérez <bi...@tuenti.com>: > I was wondering why we keep the original files processed in chukwa in > the finalArchives folder. I want to generate chukwa records for doing > pig reports on them. I would be interested in generating an archive of > chukwa records, but right now chukwa seems to generate an archive of > the original files. There is any reason for doing this? I would rather > just delete files in the dataSink, after having them loaded as > records. I'm just curious on the rationale of doing this. > > Thanks a lot! > > -- > Guille -ℬḭṩḩø- <bi...@tuenti.com> > :wq > -- Ari Rabkin asrab...@gmail.com UC Berkeley Computer Science Department