[ 
https://issues.apache.org/jira/browse/CHUKWA-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

IvyTang updated CHUKWA-646:
---------------------------

    Status: Open  (was: Patch Available)
    
> FileTailingAdaptor can't handle the file rotating rightly and the checkpoint 
> size is wrong
> ------------------------------------------------------------------------------------------
>
>                 Key: CHUKWA-646
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-646
>             Project: Chukwa
>          Issue Type: Bug
>          Components: Data Collection
>    Affects Versions: 0.5.0, 0.4.0
>         Environment: OpenJDK 64-Bit 1.6.0-20
> Linux 2.6.18-308.1.1.el5
>            Reporter: IvyTang
>              Labels: patch
>             Fix For: 0.4.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Our team has used chukwa CharFileTailingAdaptorUTF8 to collect the log4j 
> rotated log files for several months.It does help us to collect the logs from 
> everywhere to our hadoop center.
> During the work , we met several problems . And i have raised them in this 
> mail list , but i still haven't got a good solution.
> So we  read the source code , and did some changes
> Our log files are generated by the log4j ,and the log4j appender is 
> org.apache.log4j.DailyRollingFileAppender.
> If you use log4j to generate the rotated log ,may this mail will help you.
> These two problems are the causes why we have to modify the source code.
> 1. The mismatching checkpoint size and file size.
>      I raised this problem in May 14 ,"the check point offset is bigger than 
> the log file size". And Ariel Rabkin  and Eric have answered my question , 
> thanks for your replies.
>      When chukwa starts, it will read the the check point file , let the size 
> be the filereadoffset. The size in the checkpoint indicates how many bytes 
> the adaptor has send .
>      If the log source is stream or a file won't rotate , this size is right 
> ,it indeed is the filereadoffset.But the file is rorated , the checkpoint 
> size is often bigger than the file size ,and this will cause chukwa resend 
> all the log file.
>      So we add a "log.info("chunk seqID:"+c.getSeqID());" in 
> ChukwaHttpSender:send.
>     The seqid is the offset of the send chunks in this log file.
>     So when we need to restart the chukwa, we just need to stop the chukwa , 
> change the size in checkpoint to the last chunk seqid in log and start 
> chukwa.   
>       We also can directly apply the seqID to checkpoint size ,but we don't 
> know if this will cause other problems.
> 2. The method tailFile in FileTailingAdaptor is the core code of collecting 
> the log. The code use the fileReadOffset , file length to detect the rotated 
> file.
>         RandomAccessFile newReader = new RandomAccessFile(toWatch, "r");
>         len = reader.length();
>         long newLength = newReader.length();
>         if (newLength < len && fileReadOffset >= len) {
>           if (reader != null) {
>             reader.close();
>           }
>           
>           reader = newReader;
>           fileReadOffset = 0L;
>           log.debug("Adaptor|"+ adaptorID + "| File size mismatched, 
> rotating: "
>               + toWatch.getAbsolutePath());
>         } else {
>           try {
>             if (newReader != null) {
>               newReader.close();
>             }
>             newReader =null;
>           } catch (Throwable e) {
>             // do nothing.
>           }
>         }
>      This arithmetic does work in most cases. But there is a case ,that when 
> chukwa starts , the log file is 0 and it will be 0 untill it has been 
> rotated. After it has been rotated ,becase its size is 0 ,this log will be 
> removed. A new file has generated , and its size isn't 0.
>       But the len is still 0 ,newLength is > 0.So this contition  if 
> (newLength < len && fileReadOffset >= len)  will never be archived. The new 
> log file will never be detected.
>        So we changed the implemention of this method, we use timestamp to 
> detect the new log file.The lastSlurpTime is the timestamp of the last slurp 
> ,it is been declared and assigned in LWFTAdaptor .   
>        try {
>                 len = reader.length();
>                 if(lastSlurpTime == 0){
>                     lastSlurpTime = System.currentTimeMillis();
>                 }
>                 if (offsetOfFirstByte > fileReadOffset) {
>                     // If the file rotated, the recorded offsetOfFirstByte is 
> greater than
>                     // file size,reset the first byte position to beginning 
> of the file.
>                     fileReadOffset = 0;
>                     offsetOfFirstByte = 0L;
>                     log.warn("offsetOfFirstByte>fileReadOffset, resetting 
> offset to 0");
>                 }                    
>                 if (len == fileReadOffset) {
>                     File fixedNameFile = new File(toWatch.getAbsolutePath());
>                     long fixedNameLastModified = fixedNameFile.lastModified();
>                     if (fixedNameLastModified > lastSlurpTime) {
>                         // If len == fileReadOffset,the file stops rolling 
> log or the file has rotated.
>                         // But fixedNameLastModified > lastSlurpTime , this 
> means after the last slurping,the file has been written .
>                         // so the file has been rotated.
>                         boolean hasLeftData = true;
>                         while(hasLeftData){// read the possiblly generated log
>                             hasLeftData = slurp(len, reader);
>                         }
>                         RandomAccessFile newReader = new 
> RandomAccessFile(toWatch, "r");
>                         if (reader != null) {
>                             reader.close();
>                         }
>                         reader = newReader;
>                         fileReadOffset = 0L;
>                         len = reader.length();
>                         log.debug("Adaptor|" + adaptorID + "| File size 
> mismatched, rotating: " + toWatch.getAbsolutePath());                         
>                       
>                     }
>                     hasMoreData = slurp(len, reader);
>                 } else if (len < fileReadOffset) {
>                     // file has rotated and no detection
>                     if (reader != null) {
>                         reader.close();
>                     }
>                     reader = null;
>                     fileReadOffset = 0L;
>                     offsetOfFirstByte = 0L;
>                     hasMoreData = true;
>                     log.warn("Adaptor|" + adaptorID + "| file: " + 
> toWatch.getPath()
>                             + ", has rotated and no detection - reset 
> counters to 0L");
>                 } else {                              
>                     hasMoreData = slurp(len, reader);
>                 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to