> On June 7, 2016, 1:44 a.m., Mike Percy wrote:
> > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java,
> > line 161
> > <https://reviews.apache.org/r/48161/diff/1/?file=1404557#file1404557line161>
> >
> > nit: spurious parenthesis before lastSeenParentDirMTime
>
> Attila Simon wrote:
> the condition was described in the javadoc, unfortunately it is ugly but
> needed
How about this?
List<File> getMatchingFiles() {
long now = System.currentTimeMillis();
long currentParentDirMTime = parentDir.lastModified();
// Only check a maximum of once per second.
if (!cachePatternMatching ||
(currentParentDirMTime > lastSeenParentDirMTime &&
TimeUnit.SECONDS.toMillis(TimeUnit.MILLISECONDS.toSeconds(now)) >
lastCheckedTime)) {
lastMatchedFiles = getMatchingFilesNoCache();
Collections.sort(lastMatchedFiles, new
TailFile.CompareByLastModifiedTime());
lastSeenParentDirMTime = currentParentDirMTime;
lastCheckedTime =
TimeUnit.SECONDS.toMillis(TimeUnit.MILLISECONDS.toSeconds(now));
}
return lastMatchedFiles;
}
Except that we should replace the sorting with a helper function that only runs
stat() once per item.
- Mike
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48161/#review136086
-----------------------------------------------------------
On June 13, 2016, 2:14 p.m., Attila Simon wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48161/
> -----------------------------------------------------------
>
> (Updated June 13, 2016, 2:14 p.m.)
>
>
> Review request for Flume.
>
>
> Bugs: FLUME-2918
> https://issues.apache.org/jira/browse/FLUME-2918
>
>
> Repository: flume-git
>
>
> Description
> -------
>
> The way TailDir source checks which files should be tracked was improved.
> Existing implementation caused unneccessary high CPU usage for huge (+50K
> files) directories. This fix allows users to eliminate continous listing of
> parent directory (on each Source.process invocation) and introduce a more
> performant method for listing&matching files.
>
> used java.nio.file.DirectoryStream to filter files
> made pattern match calculation optionally cached
> added junit tests
> added javadoc
> added license
>
>
> Diffs
> -----
>
>
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java
> 5b6d465
>
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
> PRE-CREATION
>
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSource.java
> 8816327
>
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSourceConfigurationConstants.java
> 6165276
>
> flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
> PRE-CREATION
>
> flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java
> f9e614c
>
> Diff: https://reviews.apache.org/r/48161/diff/
>
>
> Testing
> -------
>
> mvn clean install -DskipTests -> built
> junit tests for flume-taildir-source module -> passed
>
>
> Thanks,
>
> Attila Simon
>
>