[ 
https://issues.apache.org/jira/browse/FLUME-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417283#comment-15417283
 ] 

Andrea Rota commented on FLUME-2911:
------------------------------------

Hello [~bessbd], of course I can.

Assume you have a folder where many processes write files, and in these files 
there are some .log files you are interested in transmitting with Flume. The 
processes are not under your control, and they can pollute the folder with 
other file types, such as .tmp files, .txt files, .dat files and so on.

Since you don't have the control of these processes, you are not able to tell 
in advance what kind of file you want to ignore, but for sure you know what you 
want to keep. This is a real world example, as we have processes made by third 
parties and on which we do not have any control.

If you configure Flume with {{ignorePattern = ^.\*\.\[TMP|TXT|DAT\]$}}, you 
will transmit .log files, but you may also send any other garbage file that you 
did not considered while writing the regex. Instead, if you can use the 
proposed {{includePattern}}, you would just declare {{includePattern = 
^.\*\.log$}}.

Of course you can negate the include pattern regex and use it as ignore, such 
as explained in 
http://stackoverflow.com/questions/2637675/how-to-negate-the-whole-regex but 
that negative lookahead is quite tricky and applying double negation (ignore + 
negative lookahead) sounds innatural to me.

What do you think?

> Add includePattern option in SpoolDirectorySource configuration
> ---------------------------------------------------------------
>
>                 Key: FLUME-2911
>                 URL: https://issues.apache.org/jira/browse/FLUME-2911
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: notrack, v1.6.0, v1.7.0
>            Reporter: Andrea Rota
>              Labels: features
>         Attachments: FLUME-2911.patch
>
>
> Current implementation of SpoolDirectorySource does not allow users to 
> specify a regex pattern to select which files should be monitored. Instead, 
> the current implementation allows users to specify which should *not* 
> monitored, via the ignorePattern parameter.
> I implemented the feature, allowing users to specify the include pattern as 
> {{a1.sources.src-1.includePattern=^foo.*$}} (includes all the files that 
> starts in "foo").
> By default, the includePattern regex is set to {{^.*$}} (all files). Include 
> and exclude patterns can be used at same time and the results are combined.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to