+1 Sent from my mobile. Please excuse the typos.
On 2011-02-24, at 5:33 PM, Eric Yang <[email protected]> wrote: I also support for skipping _*. Regards, Eric On 2/24/11 4:27 PM, "Ariel Rabkin" <[email protected]> wrote: I don't think this has been fixed yet in trunk, let alone 0.4. I would support skipping everything starting with _. Is there an actual use case this would break? --Ari On Thu, Feb 24, 2011 at 4:24 PM, Corbin Hoenes <[email protected]> wrote: > We're using Cloudera's CDH3 beta 4 release. Maybe they've patched in the > FileOutputCommitter stuff into their release as it's based on Hadoop 0.20.2 > Looking at the source for Chukwa 0.3 (version we are on) the > MoveToRepository class skips the _log and _temporary directories. > > Seems like Chukwa should skip the _SUCCESS directory as well? Or could a > more general skip be used like skip anything starting with and underscore? > (maybe too aggressive). > > Does Chukwa 0.4 or 0.5 fix this issue? (I'm probably going to just have to > patch 0.3 but maybe just another reason to upgrade.) > > On Thu, Feb 24, 2011 at 12:20 PM, Jerome Boulon <[email protected]> wrote: >> >> This filename is coming from >> here: http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/constant-values.html >> In general for hadoop you may want to avoid looking at any "_*" file since >> those are Hadoop related files like (_temporary, _log,…) >> /Jerome. >> From: Eric Yang <[email protected]> >> Reply-To: "[email protected]" >> <[email protected]> >> Date: Thu, 24 Feb 2011 10:55:57 -0800 >> To: "[email protected]" <[email protected]> >> Subject: Re: _SUCCESS files appearing in demuxOutput >> >> Hi Corbin, >> >> I have not seen this. What is the version of hadoop that you are using, >> are you using 0.21? It looks like the _SUCCESS file is spill out after >> demux mapreduce job. There are two possibilities leading to the creation of >> this file. Demux is modified and it is doing something that is unexpected, >> or the mapreduce framework 0.21 put that file there. >> If you are using 0.21, I would recommend to avoid it. >> >> A more stable version of Hadoop is 0.20.100 branch, and you can download >> it from: >> >> http://people.apache.org/~eyang/ >> >> Regards, >> Eric >> >> On 2/24/11 10:12 AM, "Corbin Hoenes" <[email protected]> wrote: >> >> Anyone seen this? >> >> /chukwa/postProcess/demuxOutputDir_1298061686862/_SUCCESS >> >> I clean them out and I keep getting the same file showing up and chukwa >> doesn't know how to handle it: >> >> postProcess.log: >> 2011-02-21 06:51:55,027 INFO main MoveToRepository - main procesing >> Cluster (_SUCCESS) >> 2011-02-21 06:51:55,027 INFO main MoveToRepository - >> processClutserDirectory (_SUCCESS,/chukwa/repos//_SUCCESS) >> 2011-02-21 06:51:55,028 WARN main PostProcessorManager - Error in >> processDemuxOutput: >> java.io.IOException: >> hdfs://cluster1/chukwa/postProcess/demuxOutputDir_1298061686862/_SUCCESS is >> not a directory! >> at >> org.apache.hadoop.chukwa.extraction.demux.MoveToRepository.processClutserDirectory(MoveToRepository.java:54) >> at >> org.apache.hadoop.chukwa.extraction.demux.MoveToRepository.main(MoveToRepository.java:250) >> at >> org.apache.hadoop.chukwa.extraction.demux.PostProcessorManager.movetoMainRepository(PostProcessorManager.java:201) >> at >> org.apache.hadoop.chukwa.extraction.demux.PostProcessorManager.start(PostProcessorManager.java:146) >> at >> org.apache.hadoop.chukwa.extraction.demux.PostProcessorManager.main(PostProcessorManager.java:80) >> >> > > -- Ari Rabkin [email protected] UC Berkeley Computer Science Department
