On Jul 23, 2010, at 1:51 PM, Eric Yang wrote: > On 7/23/10 1:17 PM, "William Bajzek" <williambaj...@gmail.com> wrote: > >> On Jul 23, 2010, at 9:58 AM, Eric Yang wrote: >>> MetricDataLoader can be modified to throw IOException to the executor class >>> MetricDataLoaderPool, which can throw exception to PostProcessor Manager. >>> PostProcessorManager moves the data to a temp directory. The retry logic >>> can be added to PostProcessorManager by counting the number of retry with >>> the error out sequence file before sending it to InError directory. It >>> should be the better route to manages error conditions. >> >> I'll look into this, thanks. Other than that, is the only recourse to move >> the >> failed material back into the queue manually? > > Yes. This page contains some useful information on the data flow: > > http://wiki.apache.org/hadoop/Chukwa_Processes_and_Data_Flow > > In 3.3, it moves from demuxProcessing to postProcess directory. If it > fails, move the data to demuxProcessing directory again and > PostProcessorManager will pick up the directory again and attempt to load > again.
Thanks. I've been able to work out a way to recover from this kind of failure, so I thought I'd post it for "posterity." I caused the post processing to fail by shutting down mysql before posting my xml and then waiting long enough for the post processor to run and fail. In this example, my data type is called "sample_data_1_0". In order to reprocess, first you have to run $CHUKWA_HOME/bin/stop-data-processors.sh and then you have to look in postprocess.log and find this line for each failed job: /tmp/chukwa/logs/postprocess.log:2010-07-26 13:26:40,509 INFO main MoveToRepository - >>>>>>>>>>>> Before Renamehdfs://localhost:9000/chukwa/postProcess/demuxOutputDir_1280175984809/chukwa/sample_data_1_0/sample_data_1_0_20100726_13_25.R.evt -- /chukwa/repos/chukwa/sample_data_1_0/20100726/13/25/sample_data_1_0_20100726_13_25.1.evt Then, for each failed event, you have to do: bin/hadoop fs -mkdir /chukwa/postProcess/demuxOutputDir_1280175984809/chukwa/sample_data_1_0 bin/hadoop fs -mv /chukwa/repos/chukwa/sample_data_1_0/20100726/13/25/sample_data_1_0_20100726_13_25.1.evt /chukwa/postProcess/demuxOutputDir_1280175984809/chukwa/sample_data_1_0 bin/hadoop fs -rmr /chukwa/repos/chukwa/sample_data_1_0 Then you can run $CHUKWA_HOME/bin/start-data-processors.sh and you should be good to go. This process halts the whole system beginning with the demux, but fortunately once you have started everything up again, everything queued up should still get run, so assuming you fixed the source of the problem, you shouldn't lose anything. - William Bajzek williambaj...@gmail.com