[ 
https://issues.apache.org/jira/browse/CHUKWA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056834#comment-13056834
 ] 

Eric Yang commented on CHUKWA-593:
----------------------------------

processDay function should delete the previous day directory, if the previous 
day directory is empty.  The hour between 2 days, the system is design to 
archive for previous day as soon as possible

Do you have collectors running in multiple timezones, or server clock is out of 
sync by one hour?

The busy loop should not happen unless there something continue to write to the 
previous day directory.  daysInRawArchiveDir.length==1 is to ensure the roll up 
for previous day happens as soon as possible.

If we change to >=1 then the roll up for previous day will not occur until 
1:59AM of the current day.  We should avoid this latency, if possible.

> Archive daemon: infinite loop at midnight
> -----------------------------------------
>
>                 Key: CHUKWA-593
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-593
>             Project: Chukwa
>          Issue Type: Bug
>          Components: MR Data Processors
>    Affects Versions: 0.4.0
>         Environment: Debian 5.0, Hadoop 0.20
>            Reporter: Sourygna Luangsay
>            Priority: Minor
>             Fix For: 0.4.0, 0.5.0
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> The archive manager Chukwa daemon enters an infinite loop between 24H to 1H. 
> This entails an increase of the namenode load and a huge increase of both 
> chukwa and namenode logs.
> Problem seems to come from the start function of ChukwaArchiveManager.java 
> (in package org/apache/hadoop/chukwa/extraction/archive). At midnight, we get 
> two directories in /chukwa/dataSinkArchives/ (one for the last day and one 
> for the new day). This means that we neither enter the 
> "daysInRawArchiveDir.length == 0" condition nor the 
> "daysInRawArchiveDir.length == 1" one. processDay function is then called but 
> few is done due to "modificationDate < oneHourAgo" condition.
> Finally, we loop without having slept or deleted last day directory. Such 
> process repeats itself during one hour.
> Here is how I propose to change the "daysInRawArchiveDir.length == 1" 
> condition block in the start function:
> 148         if (daysInRawArchiveDir.length >= 1 ) {
> 149           long nextRun = lastRun + (2*ONE_HOUR) - (1*60*1000);// 2h -1min
> 150           if (now < nextRun) {
> 151             log.info("lastRun < 2 hours so skip archive for now, going to 
> sleep for 30 minutes, currentDate is:" + new java.util.Date());
> 152             Thread.sleep(30 * 60 * 1000);
> 153             continue;
> 154           }
> 155         }
> As for me, it removed the infinite loop problem. But maybe there is a reason 
> to separate "1 directory" case from "many directories" case. I've been 
> reading documentation and subversion but could not find it.
> If there is one, could someone explain it to me?
> Regards.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to