[ 
https://issues.apache.org/jira/browse/FLUME-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138721#comment-14138721
 ] 

Gaurav Kumar commented on FLUME-2066:
-------------------------------------

Are there plans for fixing this issue? I am observing the exception when I try 
to copy a large file (1GB+) to spool dir using cp command. Something like- 
{{cp /sourceDir/LargeFile.txt /flumeSpoolDir}}

What is probably happening is that Linux is copying files buffer by buffer 
which is changing the size of the file and thus triggering error condition. In 
case of smaller files, even before Flume can detect file change, file has been 
fully copied.

To work around this issue, I am streaming the large file using nc command to 
Flame's netcat source. Are there better alternatives? 

> Spool directory source can get stuck in a "Serializer has been closed" loop 
> when retireCurrentFile throws an exception
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLUME-2066
>                 URL: https://issues.apache.org/jira/browse/FLUME-2066
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.4.0, v1.3.1
>            Reporter: Phil Scala
>            Assignee: Phil Scala
>
> The following 2 java files have similar code and are affected by this 
> issue... 
> 1.31. SpoolingfileLineReader.java 
> 1.4 ReliableSpoolingFileEventReader.java 
> retireCurrentFile is called by 1 caller (readLines in 1.3.1 and readEvents in 
> 1.4) 
> {code:java} 
> retireCurrentFile(); 
>       currentFile = getNextFile(); 
>       if (!currentFile.isPresent()) { 
>         return Collections.emptyList(); 
>       } 
> {code} 
> if retireCurrentFile throws an exception after closing the reader (there are 
> a few causes for an exception tobe raised which are described below) the the 
> currentFile still points to the attempted to be retired file. This causes 
> subsequent calls to readLines/readEvents to raise a "Serializer has been 
> closed" exception. At this point the application needs to be shutdown in 
> order to rectify the problem. If Flume is left running for a while, the logs 
> are littered with the error, so you have to go to the initial error logged to 
> understand what happened. 
> *Exceptions raised in "retireCurrentFile()"* 
> IlligalStateException when the file modified date changes 
> IlligalStateException when the size changes 
> IllegalStateException when renaming the current file and the target file 
> already exists (with different sizes) 
> IllegalStateException when renaming the current file and the target file 
> already exists [non windows] 
> FlumeException when renameTo does not return true. 
> The documentation does say: 
> *Warning This channel expects that only immutable, uniquely named files are 
> dropped in the spooling directory. If duplicate names are used, or files are 
> modified while being read, the source will fail with an error message *
> I am not sure however if the intention was to get caught into the "Serializer 
> has been closed" loop. 3 possible solutions: 
> 1. Re-spool the retired file, this will cause duplicates and could get caught 
> in a loop of constantly spooling this file. 
> 2. Log an error and continue spooling the next files. 
> 3. Shutdown 
> I like option..2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to