zzf created FLUME-3320:
--------------------------

             Summary: Taildir source reset position bug
                 Key: FLUME-3320
                 URL: https://issues.apache.org/jira/browse/FLUME-3320
             Project: Flume
          Issue Type: Bug
          Components: Master, Sinks+Sources
    Affects Versions: 1.8.0, 1.7.0
            Reporter: zzf
         Attachments: error.log

1) background

we use flume to transport logs from local file to kafka

every night at 0 p.m , local file will be moved to another directory to be 
compressed as backup , after that , old file's inode will be released. The 
monitor directory will generate new log file with the same filename.Since flume 
monitor files with inode , never seen any conflict before.

2 ) problem 

2019-02-23 23:59:34,157 (Log-BackgroundWorker-c1) [INFO - 
org.apache.flume.channel.file.EventQueueBackingStoreFile.beginCheckpoint(EventQueueBackingStoreFile.java:230)]
 Start checkpoint for /data/flume/filechannel/checkpoint/checkpoint, elements 
to sync = 3406
2019-02-23 23:59:34,167 (Log-BackgroundWorker-c1) [INFO - 
org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint(EventQueueBackingStoreFile.java:255)]
 Updating checkpoint metadata: logWriteOrderID: 1550778307223, queueSize: 0, 
queueHead: 1754854
2019-02-23 23:59:34,176 (Log-BackgroundWorker-c1) [INFO - 
org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:1065)] Updated 
checkpoint for file: /data/flume/filechannel/data/log-68 position: 149727224 
logWriteOrderID: 1550778307223
*2019-02-24 00:00:01,550 (PollableSourceRunner-TaildirSource-r1) [INFO - 
org.apache.flume.source.taildir.ReliableTaildirEventReader.updateTailFiles(ReliableTaildirEventReader.java:257)]
 Pos 15927858851 is larger than file size! Restarting from pos 0, file: 
/data/logs/nginx/stm-collect.cn.miaozhen.com.log, inode: 3932190*
*2019-02-24 00:00:01,550 (PollableSourceRunner-TaildirSource-r1) [INFO - 
org.apache.flume.source.taildir.TailFile.updatePos(TailFile.java:126)] Updated 
position, file: /data/logs/nginx/stm-collect.cn.miaozhen.com.log, inode: 
3932190, pos: 0*
2019-02-24 00:00:04,177 (Log-BackgroundWorker-c1) [INFO - 
org.apache.flume.channel.file.EventQueueBackingStoreFile.beginCheckpoint(EventQueueBackingStoreFile.java:230)]
 Start checkpoint for /data/flume/filechannel/checkpoint/checkpoint, elements 
to sync = 33326
2019-02-24 00:00:04,189 (Log-BackgroundWorker-c1) [INFO - 
org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint(EventQueueBackingStoreFile.java:255)]
 Updating checkpoint metadata: logWriteOrderID: 1550778350086, queueSize: 
24452, queueHead: 1763726
2019-02-24 00:00:04,213 (Log-BackgroundWorker-c1) [INFO - 
org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:1065)] Updated 
checkpoint for file: /data/flume/filechannel/data/log-68 position: 190948775 
logWriteOrderID: 1550778350086

 normally,flume will open the new file and get it's inode. if  inode in 
Map<inode,taildir> ,and taildir position larger than file size. the position 
will be reset to 0.The system went well for a long time.
but in this time,flume failed to get the new file's inode.In the log above ,you 
can see there was no 'Opening file ...'  . And somehow the old log file's 
position was reset and all the data in 23,Feb sent double.

3)question
Logically , flume relates the real file and object tailfile with inode ,even 
though I get the old file's inode,I still can not figure out why the 
tailfile.getPos() will be larger than f.length().
has this happened before? should i  blame this to java NIO? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to