> On Oct. 30, 2012, 6:22 a.m., Mike Percy wrote: > > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java, > > line 300 > > <https://reviews.apache.org/r/6377/diff/5/?file=178797#file178797line300> > > > > This should be something like: > > > > int bufferSize = bufferMaxLines * bufferMaxLineLength; > > BufferedReader reader = new BufferedReader(new FileReader(nextFile), > > bufferSize); > > reader.mark(bufferSize); > > > > ...since the default buffer size in the BufferedReader constructor is > > 8124 in JDK 6 and that may not be enough to hold your bufferSize (or it may > > be too much). Also, the edge-case behavior of BufferedReader (w.r.t. mark > > and reset) is different in situations where the constructor arg is > > different than the arg passed to mark(), so rather than deal with that > > inconsistency it makes sense to do this.
Ya good catch. > On Oct. 30, 2012, 6:22 a.m., Mike Percy wrote: > > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java, > > line 188 > > <https://reviews.apache.org/r/6377/diff/5/?file=178797#file178797line188> > > > > If the line length exceeds the length of the BufferedReader, this > > reset() call will throw a java.io.IOException: Mark invalid > > > > Also, committed = true when you hit this case, since committed = false > > does not get set until right before the return statement. > > > > Therefore, assuming the client catches the exception and recovers from > > it, the next call to readLines(n) will simply skip over those lines and the > > data will be lost. > > > > So if there is an error in which the line is too long, since we have > > decided that we going to fail then we should fail explicitly and > > permanently: > > 1. Throw a FlumeException indicating a too-long line was seen (do not > > try to reset the reader) and also: > > 2. Permanently disable the SpoolingFileLineReader object so that the > > next time someone tries to call any operation on it, such as readLines(), > > commit(), or close(), throw an IllegalStateException indicating that it is > > no longer in a usable state. > > > > In a future revision of this implementation, we can do something a > > little more graceful, but we have to ensure that we don't lose data in any > > case. Yep this should stop the world when it fails... my mistake. Added some tests to make sure this works. > On Oct. 30, 2012, 6:22 a.m., Mike Percy wrote: > > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java, > > line 363 > > <https://reviews.apache.org/r/6377/diff/5/?file=178801#file178801line363> > > > > This test is failing for me. Not sure why, haven't dug into it much yet. I can't yet reproduce this failure. - Patrick ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12907 ----------------------------------------------------------- On Oct. 22, 2012, 8:36 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Oct. 22, 2012, 8:36 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user > can have a spool directory where files are deposited for ingestion into > flume. Once ingested, the files are clearly renamed and the implementation > guarantees at-least-once delivery semantics similar to those achieved within > flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of > events into flume, but don't want to directly write their application against > the flume API. They can simply drop log files off in a spooldir and let flume > ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It > also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java > da804d7 > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java > abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java > 4a5ecae > > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java > PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java > PRE-CREATION > > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java > PRE-CREATION > > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java > PRE-CREATION > > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java > PRE-CREATION > > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java > PRE-CREATION > > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java > PRE-CREATION > > flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java > PRE-CREATION > flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670 > > Diff: https://reviews.apache.org/r/6377/diff/ > > > Testing > ------- > > Extensive unit tests and I also built and played with this using a stub flume > agent. If you look at the JIRA I have a configuration file for an agent that > will print out Avro events to the command line - that's helpful when testing > this. > > > Thanks, > > Patrick Wendell > >
