Archive daemon: infinite loop at midnight
-----------------------------------------
Key: CHUKWA-593
URL: https://issues.apache.org/jira/browse/CHUKWA-593
Project: Chukwa
Issue Type: Bug
Components: MR Data Processors
Affects Versions: 0.4.0
Environment: Debian 5.0, Hadoop 0.20
Reporter: Sourygna Luangsay
Priority: Minor
Fix For: 0.5.0, 0.4.0
The archive manager Chukwa daemon enters an infinite loop between 24H to 1H.
This entails an increase of the namenode load and a huge increase of both
chukwa and namenode logs.
Problem seems to come from the start function of ChukwaArchiveManager.java (in
package org/apache/hadoop/chukwa/extraction/archive). At midnight, we get two
directories in /chukwa/dataSinkArchives/ (one for the last day and one for the
new day). This means that we neither enter the "daysInRawArchiveDir.length ==
0" condition nor the "daysInRawArchiveDir.length == 1" one. processDay function
is then called but few is done due to "modificationDate < oneHourAgo" condition.
Finally, we loop without having slept or deleted last day directory. Such
process repeats itself during one hour.
Here is how I propose to change the "daysInRawArchiveDir.length == 1" condition
block in the start function:
148 if (daysInRawArchiveDir.length >= 1 ) {
149 long nextRun = lastRun + (2*ONE_HOUR) - (1*60*1000);// 2h -1min
150 if (now < nextRun) {
151 log.info("lastRun < 2 hours so skip archive for now, going to
sleep for 30 minutes, currentDate is:" + new java.util.Date());
152 Thread.sleep(30 * 60 * 1000);
153 continue;
154 }
155 }
As for me, it removed the infinite loop problem. But maybe there is a reason to
separate "1 directory" case from "many directories" case. I've been reading
documentation and subversion but could not find it.
If there is one, could someone explain it to me?
Regards.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira