Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and Client

Mike Percy Fri, 19 Oct 2012 15:43:33 -0700


> On Oct. 16, 2012, 9:49 p.m., Mike Percy wrote:
> > One issue now is that this does not respect a max line length due to 
> > delegating to the BufferedReader. So there is the possibility of a 
> > too-large line being read but not committed to the channel, resulting in 
> > data loss because the reset() would throw an exception of 
> > java.io.IOException: Mark invalid. I think the only way around that is to 
> > write a custom BufferedReader. See also 
> > http://stackoverflow.com/questions/5960554/maximum-line-length-for-bufferedreader-readline-in-java
> >  and http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4107821
> 
> Patrick Wendell wrote:
>     Hey Mike,
>     
>     Designing a bounded buffered reader is a nontrivial endevour (the ones 
> linked to here are pretty complicated), but I also don't see why we need it.
>     
>     If the user has a line whose length is > MAX_LINE, then we are not going 
> to be able to function correctly in any case, because the buffer isn't big 
> enough to hold the entire line, and our entire commit semantics are on a 
> per-line bases. 
>     
>     Once we've seen a line that is > MAX_LINE, the source should shut down. 
> Why don't we just check the line length after we read it and shut down the 
> source with an error? Isn't that the whole reason we have MAX_LINE, so that 
> it simplifies the sizing of the buffer?


Hey Patrick, I'm +1 on that for now. As long as we can't lose data, we can just 
fail hard in the initial version and make iterative improvements over time.


> On Oct. 16, 2012, 9:49 p.m., Mike Percy wrote:
> > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java,
> >  line 117
> > <https://reviews.apache.org/r/6377/diff/4/?file=176275#file176275line117>
> >
> >     would be better for this to return null
> 
> Patrick Wendell wrote:
>     The reason this function exists is to return a human-readable version of 
> the current file for print messages and for unit tests. That's why it returns 
> a String and not a File object.
>     
>     I updated the doc string to make the usage more clear, but given that 
> this is the intent, I'd rather have it return "<none>" - otherwise everyone 
> that wants to use it to print out state about the LineReader needs to do a 
> null check and chose their own representation of the null, and it won't be 
> consistent.

If you want to provide a string for the UI to display in the case of no file 
available, I would recommend providing some helper function which returns this 
"<none>" string when called for the UIs to use. However, if there is a filename 
called "<none>" in the directory, there would be no way for the client to be 
able to tell that it was really a file of that name if this string has a 
special meaning.


- Mike


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6377/#review12458
-----------------------------------------------------------


On Oct. 12, 2012, 5:15 p.m., Patrick Wendell wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6377/
> -----------------------------------------------------------
> 
> (Updated Oct. 12, 2012, 5:15 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Description
> -------
> 
> This patch adds a spooling directory based source. The  idea is that a user 
> can have a spool directory where files are deposited for ingestion into 
> flume. Once ingested, the files are clearly renamed and the implementation 
> guarantees at-least-once delivery semantics similar to those achieved within 
> flume itself, even across failures and restarts of the JVM running the code.
> 
> This helps fill the gap for people who want a way to get reliable delivery of 
> events into flume, but don't want to directly write their application against 
> the flume API. They can simply drop log files off in a spooldir and let flume 
> ingest asynchronously (using some shell scripts or other automated process).
> 
> Unlike the prior iteration, this patch implements a first-class source. It 
> also extends the avro client to support spooling in a similar manner.
> 
> 
> This addresses bug FlUME-1425.
>     https://issues.apache.org/jira/browse/FlUME-1425
> 
> 
> Diffs
> -----
> 
>   
> flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java
>  da804d7 
>   
> flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java
>  abbbf1c 
>   flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java 
> 4a5ecae 
>   
> flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java
>  PRE-CREATION 
>   flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java 
> PRE-CREATION 
>   
> flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java
>  PRE-CREATION 
>   
> flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java 
> PRE-CREATION 
>   
> flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java
>  PRE-CREATION 
>   
> flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java
>  PRE-CREATION 
>   
> flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java
>  PRE-CREATION 
>   
> flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java
>  PRE-CREATION 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670 
> 
> Diff: https://reviews.apache.org/r/6377/diff/
> 
> 
> Testing
> -------
> 
> Extensive unit tests and I also built and played with this using a stub flume 
> agent. If you look at the JIRA I have a configuration file for an agent that 
> will print out Avro events to the command line - that's helpful when testing 
> this.
> 
> 
> Thanks,
> 
> Patrick Wendell
> 
>

Re: Review Request: FLUME-1425: Create a SpoolDirectory Source and Client

Reply via email to