----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48161/#review136086 -----------------------------------------------------------
flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java (line 242) <https://reviews.apache.org/r/48161/#comment201442> Hmm, another system call per file (see other review comments on the code path that happens before we get to this point). flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 79) <https://reviews.apache.org/r/48161/#comment201406> Add comments here for the variables that need explanation. For example, what are the units on these timestamps? Milliseconds? flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 86) <https://reviews.apache.org/r/48161/#comment201403> nit: semi valid? Sounds like inventing terminology. If so, avoid. Also, s/consist/consisting/ flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 103) <https://reviews.apache.org/r/48161/#comment201402> why final? flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 113) <https://reviews.apache.org/r/48161/#comment201405> Why not set these default values set in the class definition i.e. private long lastSeenParentDirMTime = -1 rather than specifically initializing them in the constructor? It's less code and more obvious what is happening. flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 124) <https://reviews.apache.org/r/48161/#comment201411> grammar nit: s/which are matching regex pattern passed as object instantiation/that match the regex pattern passed in during object instantiation/ flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 125) <https://reviews.apache.org/r/48161/#comment201410> grammar nit: s/frequently/frequent/ flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 127) <https://reviews.apache.org/r/48161/#comment201413> nit: s/instruct/trigger/ here and below flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 129) <https://reviews.apache.org/r/48161/#comment201412> nit: missing period at end of sentence. flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 150) <https://reviews.apache.org/r/48161/#comment201443> Why are we providing a "sorted by last (cached) modification time" as a guarantee here? Are we using that guarantee somehow? See below on the cost of that guarantee. flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 156) <https://reviews.apache.org/r/48161/#comment201439> Can this method ever return null? If so, let's make sure that doesn't happen. It should return Collections.emptyList() if it can't find anything. flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 161) <https://reviews.apache.org/r/48161/#comment201414> nit: spurious parenthesis before lastSeenParentDirMTime flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 164) <https://reviews.apache.org/r/48161/#comment201440> This sorting function seems problematic to me. It can call stat() up to n^2 times (assuming quicksort). Shouldn't we get the last modified time of each file in the list once and then sort it? Why are we sorting, anyway? By the way, I just checked the OpenJDK source code and indeed every File.lastModified() maps to FileSystem.getLastModifiedTime(f) which maps to stat(2). See https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/solaris/native/java/io/UnixFileSystem_md.c#L205 flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 165) <https://reviews.apache.org/r/48161/#comment201430> typo: s/lastMachedFiles/lastMatchedFiles/ flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 180) <https://reviews.apache.org/r/48161/#comment201441> nit: weakly consistent? Sounds like invented terminology. Let's just say not consistent and then describe the guarantees we're providing. flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 198) <https://reviews.apache.org/r/48161/#comment201431> nit: there are two spaces at the end of the string flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java (line 205) <https://reviews.apache.org/r/48161/#comment201433> Hmm. What is the purpose of this? flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java (line 38) <https://reviews.apache.org/r/48161/#comment201434> style nit: just use one empty line here flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java (line 47) <https://reviews.apache.org/r/48161/#comment201419> nit: add a line comment summarizing this helper function's job (just a short sentence is ok, like "Append a line to the specified file.") flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java (line 49) <https://reviews.apache.org/r/48161/#comment201420> style nit: leave spaces around your brackets. We use the java style guide in Flume (admittedly not as consistently as I wish we did, though) flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java (line 55) <https://reviews.apache.org/r/48161/#comment201421> style nit: space after the comma flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java (line 59) <https://reviews.apache.org/r/48161/#comment201424> nit: rename to filesToNames() or something similar flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java (line 62) <https://reviews.apache.org/r/48161/#comment201425> Just curious... how is it possible to get a null here and why can't you prevent it? Seems unnecessary to allow returning a list with null values in it. flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java (line 63) <https://reviews.apache.org/r/48161/#comment201422> style nit: space after if and also space after == flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java (line 64) <https://reviews.apache.org/r/48161/#comment201423> style nit: use curly braces for the whole thing if you've got an else clause flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java (line 74) <https://reviews.apache.org/r/48161/#comment201418> nit: Why + ""? flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java (line 100) <https://reviews.apache.org/r/48161/#comment201086> nit: s/a file were/a file was/ here and on the next line flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java (line 105) <https://reviews.apache.org/r/48161/#comment201426> s/a files/a file/ flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java (line 106) <https://reviews.apache.org/r/48161/#comment201428> nit: avoid copy / paste. Use a final variable to hold this repeated assertion message, here and below. flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java (line 187) <https://reviews.apache.org/r/48161/#comment201435> nit: s/doesntexists/doesntexist/ flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java (line 194) <https://reviews.apache.org/r/48161/#comment201436> style nit: spaces around the + flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java (line 198) <https://reviews.apache.org/r/48161/#comment201437> nit: space after the comma flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java (line 214) <https://reviews.apache.org/r/48161/#comment201072> nit: remove extra lines (1 blank line should be enough for readability) flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java (line 229) <https://reviews.apache.org/r/48161/#comment201073> nit: remove extra lines flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java (line 80) <https://reviews.apache.org/r/48161/#comment201417> nit: spell out "To" - Mike Percy On June 2, 2016, 5:08 a.m., Attila Simon wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48161/ > ----------------------------------------------------------- > > (Updated June 2, 2016, 5:08 a.m.) > > > Review request for Flume. > > > Bugs: FLUME-2918 > https://issues.apache.org/jira/browse/FLUME-2918 > > > Repository: flume-git > > > Description > ------- > > The way TailDir source checks which files should be tracked was improved. > Existing implementation caused unneccessary high CPU usage for huge (+50K > files) directories. This fix allows users to eliminate continous listing of > parent directory (on each Source.process invocation) and introduce a more > performant method for listing&matching files. > > used java.nio.file.DirectoryStream to filter files > made pattern match calculation optionally cached > added junit tests > added javadoc > added license > > > Diffs > ----- > > > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java > 5b6d465 > > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java > PRE-CREATION > > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSource.java > 8816327 > > flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSourceConfigurationConstants.java > 6165276 > > flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java > PRE-CREATION > > flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java > f9e614c > > Diff: https://reviews.apache.org/r/48161/diff/ > > > Testing > ------- > > mvn clean install -DskipTests -> built > junit tests for flume-taildir-source module -> passed > > > Thanks, > > Attila Simon > >
