> On Oct. 16, 2012, 9:49 p.m., Mike Percy wrote: > > One issue now is that this does not respect a max line length due to > > delegating to the BufferedReader. So there is the possibility of a > > too-large line being read but not committed to the channel, resulting in > > data loss because the reset() would throw an exception of > > java.io.IOException: Mark invalid. I think the only way around that is to > > write a custom BufferedReader. See also > > http://stackoverflow.com/questions/5960554/maximum-line-length-for-bufferedreader-readline-in-java > > and http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4107821 > > Patrick Wendell wrote: > Hey Mike, > > Designing a bounded buffered reader is a nontrivial endevour (the ones > linked to here are pretty complicated), but I also don't see why we need it. > > If the user has a line whose length is > MAX_LINE, then we are not going > to be able to function correctly in any case, because the buffer isn't big > enough to hold the entire line, and our entire commit semantics are on a > per-line bases. > > Once we've seen a line that is > MAX_LINE, the source should shut down. > Why don't we just check the line length after we read it and shut down the > source with an error? Isn't that the whole reason we have MAX_LINE, so that > it simplifies the sizing of the buffer?
Hey Patrick, I'm +1 on that for now. As long as we can't lose data, we can just fail hard in the initial version and make iterative improvements over time. > On Oct. 16, 2012, 9:49 p.m., Mike Percy wrote: > > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java, > > line 117 > > <https://reviews.apache.org/r/6377/diff/4/?file=176275#file176275line117> > > > > would be better for this to return null > > Patrick Wendell wrote: > The reason this function exists is to return a human-readable version of > the current file for print messages and for unit tests. That's why it returns > a String and not a File object. > > I updated the doc string to make the usage more clear, but given that > this is the intent, I'd rather have it return "<none>" - otherwise everyone > that wants to use it to print out state about the LineReader needs to do a > null check and chose their own representation of the null, and it won't be > consistent. If you want to provide a string for the UI to display in the case of no file available, I would recommend providing some helper function which returns this "<none>" string when called for the UIs to use. However, if there is a filename called "<none>" in the directory, there would be no way for the client to be able to tell that it was really a file of that name if this string has a special meaning. - Mike ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6377/#review12458 ----------------------------------------------------------- On Oct. 12, 2012, 5:15 p.m., Patrick Wendell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6377/ > ----------------------------------------------------------- > > (Updated Oct. 12, 2012, 5:15 p.m.) > > > Review request for Flume. > > > Description > ------- > > This patch adds a spooling directory based source. The idea is that a user > can have a spool directory where files are deposited for ingestion into > flume. Once ingested, the files are clearly renamed and the implementation > guarantees at-least-once delivery semantics similar to those achieved within > flume itself, even across failures and restarts of the JVM running the code. > > This helps fill the gap for people who want a way to get reliable delivery of > events into flume, but don't want to directly write their application against > the flume API. They can simply drop log files off in a spooldir and let flume > ingest asynchronously (using some shell scripts or other automated process). > > Unlike the prior iteration, this patch implements a first-class source. It > also extends the avro client to support spooling in a similar manner. > > > This addresses bug FlUME-1425. > https://issues.apache.org/jira/browse/FlUME-1425 > > > Diffs > ----- > > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java > da804d7 > > flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java > abbbf1c > flume-ng-core/src/main/java/org/apache/flume/client/avro/AvroCLIClient.java > 4a5ecae > > flume-ng-core/src/main/java/org/apache/flume/client/avro/BufferedLineReader.java > PRE-CREATION > flume-ng-core/src/main/java/org/apache/flume/client/avro/LineReader.java > PRE-CREATION > > flume-ng-core/src/main/java/org/apache/flume/client/avro/SpoolingFileLineReader.java > PRE-CREATION > > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java > PRE-CREATION > > flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java > PRE-CREATION > > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestBufferedLineReader.java > PRE-CREATION > > flume-ng-core/src/test/java/org/apache/flume/client/avro/TestSpoolingFileLineReader.java > PRE-CREATION > > flume-ng-core/src/test/java/org/apache/flume/source/TestSpoolDirectorySource.java > PRE-CREATION > flume-ng-doc/sphinx/FlumeUserGuide.rst 953a670 > > Diff: https://reviews.apache.org/r/6377/diff/ > > > Testing > ------- > > Extensive unit tests and I also built and played with this using a stub flume > agent. If you look at the JIRA I have a configuration file for an agent that > will print out Avro events to the command line - that's helpful when testing > this. > > > Thanks, > > Patrick Wendell > >
