-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6918/
-----------------------------------------------------------
(Updated Sept. 14, 2012, 5:27 a.m.)
Review request for Flume.
Changes
-------
Mike,
Sorry for taking a while to get around to this. Here is my second attempt,
based on your comments.
Major changes:
* Made FlumeFormatter specific to HDFSSequenceFile. HDFS{Compressed}DataStream,
BucketWriter, HDFSEventSink now have no reference to FlumeFormatter. In terms
of configuration, HDFS{Compressed}DataStream now ignore "hdfs.writeFormat" and
only care about "hdfs.serializer".
* Altered FlumeFormatter's interface to allow an event to be serialized into
zero, one or more SequenceFile records, rather than a 1-to-1 event->record
mapping.
* Renamed FlumeFormatter to SeqFileFormatter and moved it from flume-ng-core to
flume-hdfs-sink.
* Updated to latest trunk
Notes:
* My IDE seems to have altered some formatting and moved a few imports around.
I can fix those if you like.
* You're absolutely right that HDFSSequenceFile should be calling syncFs(). I
always thought having both sync() and syncFs() in the Hadoop API was a recipe
for disaster :) I will open a new issue for that and fix it in a separate patch.
* I'd like you to look most closely at HDFSCompressedDataStream and check I
haven't broken anything there. FWIW, the unit test still passes.
Description
-------
This patch allows users to customise the format of HDFS SequenceFiles by
providing a custom FlumeFormatter implementation.
Currently, the user can set hdfs.writeFormat to either "Text" or "Writable",
corresponding to HDFSTextFormatter and HDFSWritableFormatter respectively.
With this patch, hdfs.writeFormat can also be set to the full class name of a
Builder implementation, e.g.:
agent_foo.sinks.hdfs-sink.writeFormat=com.mycompany.flume.MyCustomFormatter$Builder
They can also pass custom configuration params to the builder, e.g.:
agent_foo.sinks.hdfs-sink.writeFormat.ignoreHeaders=foo,bar
These params will be passed to the Builder's build() method as a Context object.
I've tried to be as consistent as possible with the design of
EventSerializerFactory:
* Use an enum for the different formatter types, rather than static strings.
* Use a Builder, rather than constructing a FlumeFormatter directly.
This addresses bug FLUME-1100.
https://issues.apache.org/jira/browse/FLUME-1100
Diffs (updated)
-----
flume-ng-core/src/main/java/org/apache/flume/sink/FlumeFormatter.java 05d60b7
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java
6408eb9
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSCompressedDataStream.java
80341ef
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSDataStream.java
2ac06d7
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java
9a76ecb
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSFormatterFactory.java
7b47942
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java
ac9104d
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSTextFormatter.java
5839dbb
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSWritableFormatter.java
9f03389
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSWriter.java
6408e9b
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/SeqFileFormatter.java
PRE-CREATION
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/SeqFileFormatterFactory.java
PRE-CREATION
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/SeqFileFormatterType.java
PRE-CREATION
flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadDataStream.java
423f7c1
flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java
a911ab7
flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MockHDFSWriter.java
24a7cbf
flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MyCustomFormatter.java
PRE-CREATION
flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestBucketWriter.java
bb12188
flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSCompressedDataStream.java
f537732
flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestSeqFileFormatterFactory.java
PRE-CREATION
Diff: https://reviews.apache.org/r/6918/diff/
Testing
-------
Unit tests included in patch.
Using a patched build of Flume in an internal project (not in production).
Thanks,
Chris Birchall