[ 
https://issues.apache.org/jira/browse/FLUME-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15210216#comment-15210216
 ] 

Jonathan Smith commented on FLUME-2859:
---------------------------------------

This seems to be a fairly common request. I have looked at adding this 
functionality to the existing SpoolDir source, however there are some issues 
when it comes to streaming events from a GZip for a Tar file in a way that 
supports Flume's event guarantees. *Neither of these streams are markable or 
resettable*, which is a requirement for the ResettableInputStream used by the 
serializers to read events. The whole point of a stream being resettable is 
that we can keep track of where we are in the file so that if the channel is 
full, we can start reading from the point where we committed the last 
successful transaction. The way I got around this is by storing the events in 
memory, and keeping the channel transaction sizes small. 

Perhaps we could integrate this sort of functionality into the Directory Source 
as a flag "readCompressedFiles", that would add the additional caveat of using 
more memory when the channel is being filled up and the source must reset often.

> flume Spooling Directory Source can't   ingested  gzip files.
> -------------------------------------------------------------
>
>                 Key: FLUME-2859
>                 URL: https://issues.apache.org/jira/browse/FLUME-2859
>             Project: Flume
>          Issue Type: Request
>          Components: Sinks+Sources
>            Reporter: jia.fu
>
> flume can ingested Text files .but we have a large number of gzip files on  
> three servers disk ,flume can't ingested  it .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to