[
https://issues.apache.org/jira/browse/FLUME-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15210216#comment-15210216
]
Jonathan Smith commented on FLUME-2859:
---------------------------------------
This seems to be a fairly common request. I have looked at adding this
functionality to the existing SpoolDir source, however there are some issues
when it comes to streaming events from a GZip for a Tar file in a way that
supports Flume's event guarantees. *Neither of these streams are markable or
resettable*, which is a requirement for the ResettableInputStream used by the
serializers to read events. The whole point of a stream being resettable is
that we can keep track of where we are in the file so that if the channel is
full, we can start reading from the point where we committed the last
successful transaction. The way I got around this is by storing the events in
memory, and keeping the channel transaction sizes small.
Perhaps we could integrate this sort of functionality into the Directory Source
as a flag "readCompressedFiles", that would add the additional caveat of using
more memory when the channel is being filled up and the source must reset often.
> flume Spooling Directory Source can't ingested gzip files.
> -------------------------------------------------------------
>
> Key: FLUME-2859
> URL: https://issues.apache.org/jira/browse/FLUME-2859
> Project: Flume
> Issue Type: Request
> Components: Sinks+Sources
> Reporter: jia.fu
>
> flume can ingested Text files .but we have a large number of gzip files on
> three servers disk ,flume can't ingested it .
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)