-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6918/#review11085
-----------------------------------------------------------
Hey, thanks for this contribution! I agree with the overall direction of the
approach here. However since by doing this, we are exposing these APIs as
public APIs, I think now would be a great time to take a good hard look at
FlumeFormatter itself...
FlumeFormatter is not a very good interface, in my opinion. It is actually two
interfaces merged into one... someone would either call the
SequenceFile-related stuff, or would call getBytes(). I don't really think that
makes much sense and I think getBytes() should be removed from FlumeFormatter.
The only class that calls it is HDFSCompressedDataStream, which should really
just be using EventSerializer (which is what HDFSDataStream uses).
One additional thing we should consider is whether it would be doable to allow
a FlumeFormatter (which, sans getBytes(), is pretty targeted for SequenceFiles)
to write multiple key/value pairs from a single event. The desire for that
feature came up on the list a couple weeks ago, and if we are committing to a
public API then I'd like to give it a shot. It's easy with EventSerializer, for
example.
Any thoughts on this? I think it's acceptable to expose the SequenceFile.Writer
in the interface if we need to do that to make this possible. How about
changing it to something like this?
interface FlumeFormatter {
Class getKeyClass();
Class getValueClass();
void write(Event e, SequenceFile.Writer out);
interface Builder {
FlumeFormatter build(Context context);
}
}
- Mike Percy
On Sept. 5, 2012, 5:51 a.m., Chris Birchall wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6918/
> -----------------------------------------------------------
>
> (Updated Sept. 5, 2012, 5:51 a.m.)
>
>
> Review request for Flume.
>
>
> Description
> -------
>
> This patch allows users to customise the format of HDFS SequenceFiles by
> providing a custom FlumeFormatter implementation.
>
> Currently, the user can set hdfs.writeFormat to either "Text" or "Writable",
> corresponding to HDFSTextFormatter and HDFSWritableFormatter respectively.
>
> With this patch, hdfs.writeFormat can also be set to the full class name of a
> Builder implementation, e.g.:
>
> agent_foo.sinks.hdfs-sink.writeFormat=com.mycompany.flume.MyCustomFormatter$Builder
>
> They can also pass custom configuration params to the builder, e.g.:
> agent_foo.sinks.hdfs-sink.writeFormat.ignoreHeaders=foo,bar
>
> These params will be passed to the Builder's build() method as a Context
> object.
>
> I've tried to be as consistent as possible with the design of
> EventSerializerFactory:
> * Use an enum for the different formatter types, rather than static strings.
> * Use a Builder, rather than constructing a FlumeFormatter directly.
>
>
> This addresses bug FLUME-1100.
> https://issues.apache.org/jira/browse/FLUME-1100
>
>
> Diffs
> -----
>
> flume-ng-core/src/main/java/org/apache/flume/sink/FlumeFormatter.java
> 05d60b7
>
> flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java
> 9a76ecb
>
> flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSFormatterFactory.java
> 7b47942
>
> flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSFormatterType.java
> PRE-CREATION
>
> flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSTextFormatter.java
> 5839dbb
>
> flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSWritableFormatter.java
> 9f03389
>
> flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MyCustomFormatter.java
> PRE-CREATION
>
> flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSFormatterFactory.java
> PRE-CREATION
>
> Diff: https://reviews.apache.org/r/6918/diff/
>
>
> Testing
> -------
>
> Unit tests included in patch.
>
> Using a patched build of Flume in an internal project (not in production).
>
>
> Thanks,
>
> Chris Birchall
>
>