-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6918/
-----------------------------------------------------------

(Updated Sept. 14, 2012, 5:27 a.m.)


Review request for Flume.


Changes
-------

Mike,

Sorry for taking a while to get around to this. Here is my second attempt, 
based on your comments.

Major changes:

* Made FlumeFormatter specific to HDFSSequenceFile. HDFS{Compressed}DataStream, 
BucketWriter, HDFSEventSink now have no reference to FlumeFormatter. In terms 
of configuration, HDFS{Compressed}DataStream now ignore "hdfs.writeFormat" and 
only care about "hdfs.serializer".

* Altered FlumeFormatter's interface to allow an event to be serialized into 
zero, one or more SequenceFile records, rather than a 1-to-1 event->record 
mapping.

* Renamed FlumeFormatter to SeqFileFormatter and moved it from flume-ng-core to 
flume-hdfs-sink.

* Updated to latest trunk

Notes:

* My IDE seems to have altered some formatting and moved a few imports around. 
I can fix those if you like.

* You're absolutely right that HDFSSequenceFile should be calling syncFs(). I 
always thought having both sync() and syncFs() in the Hadoop API was a recipe 
for disaster :) I will open a new issue for that and fix it in a separate patch.

* I'd like you to look most closely at HDFSCompressedDataStream and check I 
haven't broken anything there. FWIW, the unit test still passes.


Description
-------

This patch allows users to customise the format of HDFS SequenceFiles by 
providing a custom FlumeFormatter implementation.

Currently, the user can set hdfs.writeFormat to either "Text" or "Writable", 
corresponding to HDFSTextFormatter and HDFSWritableFormatter respectively.

With this patch, hdfs.writeFormat can also be set to the full class name of a 
Builder implementation, e.g.:
    
agent_foo.sinks.hdfs-sink.writeFormat=com.mycompany.flume.MyCustomFormatter$Builder

They can also pass custom configuration params to the builder, e.g.:
    agent_foo.sinks.hdfs-sink.writeFormat.ignoreHeaders=foo,bar

These params will be passed to the Builder's build() method as a Context object.

I've tried to be as consistent as possible with the design of 
EventSerializerFactory:
* Use an enum for the different formatter types, rather than static strings.
* Use a Builder, rather than constructing a FlumeFormatter directly.


This addresses bug FLUME-1100.
    https://issues.apache.org/jira/browse/FLUME-1100


Diffs (updated)
-----

  flume-ng-core/src/main/java/org/apache/flume/sink/FlumeFormatter.java 05d60b7 
  
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java
 6408eb9 
  
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSCompressedDataStream.java
 80341ef 
  
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSDataStream.java
 2ac06d7 
  
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java
 9a76ecb 
  
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSFormatterFactory.java
 7b47942 
  
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java
 ac9104d 
  
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSTextFormatter.java
 5839dbb 
  
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSWritableFormatter.java
 9f03389 
  
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSWriter.java
 6408e9b 
  
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/SeqFileFormatter.java
 PRE-CREATION 
  
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/SeqFileFormatterFactory.java
 PRE-CREATION 
  
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/SeqFileFormatterType.java
 PRE-CREATION 
  
flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadDataStream.java
 423f7c1 
  
flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadSeqWriter.java
 a911ab7 
  
flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MockHDFSWriter.java
 24a7cbf 
  
flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MyCustomFormatter.java
 PRE-CREATION 
  
flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestBucketWriter.java
 bb12188 
  
flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSCompressedDataStream.java
 f537732 
  
flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestSeqFileFormatterFactory.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/6918/diff/


Testing
-------

Unit tests included in patch.

Using a patched build of Flume in an internal project (not in production).


Thanks,

Chris Birchall

Reply via email to