[
https://issues.apache.org/jira/browse/FLUME-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417283#comment-15417283
]
Andrea Rota commented on FLUME-2911:
------------------------------------
Hello [~bessbd], of course I can.
Assume you have a folder where many processes write files, and in these files
there are some .log files you are interested in transmitting with Flume. The
processes are not under your control, and they can pollute the folder with
other file types, such as .tmp files, .txt files, .dat files and so on.
Since you don't have the control of these processes, you are not able to tell
in advance what kind of file you want to ignore, but for sure you know what you
want to keep. This is a real world example, as we have processes made by third
parties and on which we do not have any control.
If you configure Flume with {{ignorePattern = ^.\*\.\[TMP|TXT|DAT\]$}}, you
will transmit .log files, but you may also send any other garbage file that you
did not considered while writing the regex. Instead, if you can use the
proposed {{includePattern}}, you would just declare {{includePattern =
^.\*\.log$}}.
Of course you can negate the include pattern regex and use it as ignore, such
as explained in
http://stackoverflow.com/questions/2637675/how-to-negate-the-whole-regex but
that negative lookahead is quite tricky and applying double negation (ignore +
negative lookahead) sounds innatural to me.
What do you think?
> Add includePattern option in SpoolDirectorySource configuration
> ---------------------------------------------------------------
>
> Key: FLUME-2911
> URL: https://issues.apache.org/jira/browse/FLUME-2911
> Project: Flume
> Issue Type: Improvement
> Components: Sinks+Sources
> Affects Versions: notrack, v1.6.0, v1.7.0
> Reporter: Andrea Rota
> Labels: features
> Attachments: FLUME-2911.patch
>
>
> Current implementation of SpoolDirectorySource does not allow users to
> specify a regex pattern to select which files should be monitored. Instead,
> the current implementation allows users to specify which should *not*
> monitored, via the ignorePattern parameter.
> I implemented the feature, allowing users to specify the include pattern as
> {{a1.sources.src-1.includePattern=^foo.*$}} (includes all the files that
> starts in "foo").
> By default, the includePattern regex is set to {{^.*$}} (all files). Include
> and exclude patterns can be used at same time and the results are combined.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)