-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48161/#review136086
-----------------------------------------------------------




flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java
 (line 242)
<https://reviews.apache.org/r/48161/#comment201442>

    Hmm, another system call per file (see other review comments on the code 
path that happens before we get to this point).



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
 (line 79)
<https://reviews.apache.org/r/48161/#comment201406>

    Add comments here for the variables that need explanation. For example, 
what are the units on these timestamps? Milliseconds?



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
 (line 86)
<https://reviews.apache.org/r/48161/#comment201403>

    nit: semi valid? Sounds like inventing terminology. If so, avoid. Also, 
s/consist/consisting/



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
 (line 103)
<https://reviews.apache.org/r/48161/#comment201402>

    why final?



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
 (line 113)
<https://reviews.apache.org/r/48161/#comment201405>

    Why not set these default values set in the class definition i.e. private 
long lastSeenParentDirMTime = -1 rather than specifically initializing them in 
the constructor? It's less code and more obvious what is happening.



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
 (line 124)
<https://reviews.apache.org/r/48161/#comment201411>

    grammar nit: s/which are matching regex pattern passed as object 
instantiation/that match the regex pattern passed in during object 
instantiation/



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
 (line 125)
<https://reviews.apache.org/r/48161/#comment201410>

    grammar nit: s/frequently/frequent/



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
 (line 127)
<https://reviews.apache.org/r/48161/#comment201413>

    nit: s/instruct/trigger/ here and below



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
 (line 129)
<https://reviews.apache.org/r/48161/#comment201412>

    nit: missing period at end of sentence.



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
 (line 150)
<https://reviews.apache.org/r/48161/#comment201443>

    Why are we providing a "sorted by last (cached) modification time" as a 
guarantee here? Are we using that guarantee somehow? See below on the cost of 
that guarantee.



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
 (line 156)
<https://reviews.apache.org/r/48161/#comment201439>

    Can this method ever return null? If so, let's make sure that doesn't 
happen. It should return Collections.emptyList() if it can't find anything.



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
 (line 161)
<https://reviews.apache.org/r/48161/#comment201414>

    nit: spurious parenthesis before lastSeenParentDirMTime



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
 (line 164)
<https://reviews.apache.org/r/48161/#comment201440>

    This sorting function seems problematic to me. It can call stat() up to n^2 
times (assuming quicksort). Shouldn't we get the last modified time of each 
file in the list once and then sort it? Why are we sorting, anyway?
    
    By the way, I just checked the OpenJDK source code and indeed every 
File.lastModified() maps to FileSystem.getLastModifiedTime(f) which maps to 
stat(2). See 
https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/solaris/native/java/io/UnixFileSystem_md.c#L205



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
 (line 165)
<https://reviews.apache.org/r/48161/#comment201430>

    typo: s/lastMachedFiles/lastMatchedFiles/



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
 (line 180)
<https://reviews.apache.org/r/48161/#comment201441>

    nit: weakly consistent? Sounds like invented terminology. Let's just say 
not consistent and then describe the guarantees we're providing.



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
 (line 198)
<https://reviews.apache.org/r/48161/#comment201431>

    nit: there are two spaces at the end of the string



flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
 (line 205)
<https://reviews.apache.org/r/48161/#comment201433>

    Hmm. What is the purpose of this?



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
 (line 38)
<https://reviews.apache.org/r/48161/#comment201434>

    style nit: just use one empty line here



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
 (line 47)
<https://reviews.apache.org/r/48161/#comment201419>

    nit: add a line comment summarizing this helper function's job (just a 
short sentence is ok, like "Append a line to the specified file.")



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
 (line 49)
<https://reviews.apache.org/r/48161/#comment201420>

    style nit: leave spaces around your brackets. We use the java style guide 
in Flume (admittedly not as consistently as I wish we did, though)



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
 (line 55)
<https://reviews.apache.org/r/48161/#comment201421>

    style nit: space after the comma



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
 (line 59)
<https://reviews.apache.org/r/48161/#comment201424>

    nit: rename to filesToNames() or something similar



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
 (line 62)
<https://reviews.apache.org/r/48161/#comment201425>

    Just curious... how is it possible to get a null here and why can't you 
prevent it? Seems unnecessary to allow returning a list with null values in it.



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
 (line 63)
<https://reviews.apache.org/r/48161/#comment201422>

    style nit: space after if and also space after ==



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
 (line 64)
<https://reviews.apache.org/r/48161/#comment201423>

    style nit: use curly braces for the whole thing if you've got an else clause



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
 (line 74)
<https://reviews.apache.org/r/48161/#comment201418>

    nit: Why + ""?



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
 (line 100)
<https://reviews.apache.org/r/48161/#comment201086>

    nit: s/a file were/a file was/
    
    here and on the next line



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
 (line 105)
<https://reviews.apache.org/r/48161/#comment201426>

    s/a files/a file/



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
 (line 106)
<https://reviews.apache.org/r/48161/#comment201428>

    nit: avoid copy / paste. Use a final variable to hold this repeated 
assertion message, here and below.



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
 (line 187)
<https://reviews.apache.org/r/48161/#comment201435>

    nit: s/doesntexists/doesntexist/



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
 (line 194)
<https://reviews.apache.org/r/48161/#comment201436>

    style nit: spaces around the +



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
 (line 198)
<https://reviews.apache.org/r/48161/#comment201437>

    nit: space after the comma



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
 (line 214)
<https://reviews.apache.org/r/48161/#comment201072>

    nit: remove extra lines (1 blank line should be enough for readability)



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
 (line 229)
<https://reviews.apache.org/r/48161/#comment201073>

    nit: remove extra lines



flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java
 (line 80)
<https://reviews.apache.org/r/48161/#comment201417>

    nit: spell out "To"


- Mike Percy


On June 2, 2016, 5:08 a.m., Attila Simon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48161/
> -----------------------------------------------------------
> 
> (Updated June 2, 2016, 5:08 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2918
>     https://issues.apache.org/jira/browse/FLUME-2918
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> The way TailDir source checks which files should be tracked was improved. 
> Existing implementation caused unneccessary high CPU usage for huge (+50K 
> files) directories. This fix allows users to eliminate continous listing of 
> parent directory (on each Source.process invocation) and introduce a more 
> performant method for listing&matching files.
> 
> used java.nio.file.DirectoryStream to filter files
> made pattern match calculation optionally cached
> added junit tests
> added javadoc
> added license
> 
> 
> Diffs
> -----
> 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java
>  5b6d465 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirMatcher.java
>  PRE-CREATION 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSource.java
>  8816327 
>   
> flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSourceConfigurationConstants.java
>  6165276 
>   
> flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirMatcher.java
>  PRE-CREATION 
>   
> flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java
>  f9e614c 
> 
> Diff: https://reviews.apache.org/r/48161/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install -DskipTests -> built
> junit tests for flume-taildir-source module -> passed
> 
> 
> Thanks,
> 
> Attila Simon
> 
>

Reply via email to