-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6377/#review12907
-----------------------------------------------------------


Sorry for the delay in this review, I was out of town last week. There is still 
an issue with the BufferedReader / mark / reset semantics, please see below for 
details.


flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java
<https://reviews.apache.org/r/6377/#comment27742>

    Please add an note to the javadocs saying that the SpoolingFileLineReader 
class is ONLY for internal use by Flume components, not for developers 
extending Flume.



flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java
<https://reviews.apache.org/r/6377/#comment27745>

    If the line length exceeds the length of the BufferedReader, this reset() 
call will throw a java.io.IOException: Mark invalid
    
    Also, committed = true when you hit this case, since committed = false does 
not get set until right before the return statement.
    
    Therefore, assuming the client catches the exception and recovers from it, 
the next call to readLines(n) will simply skip over those lines and the data 
will be lost.
    
    So if there is an error in which the line is too long, since we have 
decided that we going to fail then we should fail explicitly and permanently:
    1. Throw a FlumeException indicating a too-long line was seen (do not try 
to reset the reader) and also:
    2. Permanently disable the SpoolingFileLineReader object so that the next 
time someone tries to call any operation on it, such as readLines(), commit(), 
or close(), throw an IllegalStateException indicating that it is no longer in a 
usable state.
    
    In a future revision of this implementation, we can do something a little 
more graceful, but we have to ensure that we don't lose data in any case.



flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java
<https://reviews.apache.org/r/6377/#comment27740>

    This should be something like:
    
    int bufferSize = bufferMaxLines * bufferMaxLineLength;
    BufferedReader reader = new BufferedReader(new FileReader(nextFile), 
bufferSize);
    reader.mark(bufferSize);
    
    ...since the default buffer size in the BufferedReader constructor is 8124 
in JDK 6 and that may not be enough to hold your bufferSize (or it may be too 
much). Also, the edge-case behavior of BufferedReader (w.r.t. mark and reset) 
is different in situations where the constructor arg is different than the arg 
passed to mark(), so rather than deal with that inconsistency it makes sense to 
do this.



flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java
<https://reviews.apache.org/r/6377/#comment27746>

    This test is failing for me. Not sure why, haven't dug into it much yet.



flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java
<https://reviews.apache.org/r/6377/#comment27747>

    It's failing on this assert



flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java
<https://reviews.apache.org/r/6377/#comment27741>

    This line is not long enough to exceed the buffer size in the mark() call, 
so we are not testing the case of an extremely long line. We need to test the 
situation where reset() throws an exception, since we will lose data in that 
case with the current implementation.
    
    While a barely-too-long line is a valid case to test, we also need a case 
that tests the extremely long line scenario, i.e.:
    
    // make a prefix string as long as the internal buffer size
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < maxLines * maxLineLength; i++) {
      sb.append('x');
    }
    String lotsofXs = sb.toString();
    
    Files.write("file1line1\nfile1line2\nfile1line3\nfile1line4\n" +
      "file1line5\nfile1line6\nfile1line7\nfile1line8\n" +
      lotsOfXs + " reallyreallyreallyreallyLongfile1line9\n" +  // <-- line 
exceeds BufferedReader internal buf
      "file1line10\nfile1line11\nfile1line12\nfile1line13\n",
      f1, Charsets.UTF_8);


- Mike Percy


On Oct. 22, 2012, 8:36 p.m., Patrick Wendell wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6377/
> -----------------------------------------------------------
> 
> (Updated Oct. 22, 2012, 8:36 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Description
> -------
> 
> This patch adds a spooling directory based source. The  idea is that a user 
> can have a spool directory where files are deposited for ingestion into 
> flume. Once ingested, the files are clearly renamed and the implementation 
> guarantees at-least-once delivery semantics similar to those achieved within 
> flume itself, even across failures and restarts of the JVM running the code.
> 
> This helps fill the gap for people who want a way to get reliable delivery of 
> events into flume, but don't want to directly write their application against 
> the flume API. They can simply drop log files off in a spooldir and let flume 
> ingest asynchronously (using some shell scripts or other automated process).
> 
> Unlike the prior iteration, this patch implements a first-class source. It 
> also extends the avro client to support spooling in a similar manner.
> 
> 
> This addresses bug FlUME-1425.
>     https://issues.apache.org/jira/browse/FlUME-1425
> 
> 
> Diffs
> -----
> 
>   
> flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java
>  da804d7 
>   
> flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java
>  abbbf1c 
>   flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 
> 4a5ecae 
>   
> flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java
>  PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java 
> PRE-CREATION 
>   
> flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java
>  PRE-CREATION 
>   
> flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java 
> PRE-CREATION 
>   
> flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java
>  PRE-CREATION 
>   
> flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java
>  PRE-CREATION 
>   
> flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java
>  PRE-CREATION 
>   
> flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java
>  PRE-CREATION 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670 
> 
> Diff: https://reviews.apache.org/r/6377/diff/
> 
> 
> Testing
> -------
> 
> Extensive unit tests and I also built and played with this using a stub flume 
> agent. If you look at the JIRA I have a configuration file for an agent that 
> will print out Avro events to the command line - that's helpful when testing 
> this.
> 
> 
> Thanks,
> 
> Patrick Wendell
> 
>

Reply via email to