[
https://issues.apache.org/jira/browse/FLUME-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393556#comment-13393556
]
Leslin (Hong Xiang Lin) commented on FLUME-1200:
------------------------------------------------
Below is the final implement:
1). if user set hdfs.codeC when hdfs.fileType = DataStream, sink will use
DataStream to output file, which is no compress extension like .snappy. Warning
message is added to show the codeC will be ignored.
2). Pre-check will make sure that codec is required when fileType is set
CompressedStream.
I tested with following scenarios:
1. compressStream without codec --> there will be exception
agent.sinks.k1.hdfs.fileType = CompressedStream
#agent.sinks.k1.hdfs.codeC = DefaultCodec
12/06/17 22:46:35 INFO sink.DefaultSinkFactory: Creating instance of sink k1
typeHDFS
12/06/17 22:46:35 ERROR properties.PropertiesFileConfigurationProvider: Failed
to load configuration data. Exception follows.
java.lang.NullPointerException: It's essential to set compress codec when
fileType is: CompressedStream
at
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
at
org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:221)
2. Works fine, output file with .deflate extension.
agent.sinks.k1.hdfs.fileType = CompressedStream
agent.sinks.k1.hdfs.codeC = DefaultCodec
3. Works fine, output file without compress extension.
agent.sinks.k1.hdfs.fileType = CompressedStream
agent.sinks.k1.hdfs.codeC = snappyCodec
4. There is warning, output file without compress extension.
agent.sinks.k1.hdfs.fileType = DataStream
#agent.sinks.k1.hdfs.codeC = snappyCodec
12/06/17 23:08:44 INFO snappy.LoadSnappy: Snappy native library loaded
12/06/17 23:08:44 WARN hdfs.HDFSEventSink: CodeC: snappyCodec is ignored as
fileType: DataStream is uncompressed. To change fileType if want output
compressed.
12/06/17 23:08:44 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
5. works fine, output file with .snappy extension
agent.sinks.k1.hdfs.fileType = SequenceFile
agent.sinks.k1.hdfs.codeC = snappyCodec
6. Works fine, output file without .snappy extension
agent.sinks.k1.hdfs.fileType = SequenceFile
#agent.sinks.k1.hdfs.codeC = snappyCodec
> HDFSEventSink causes *.snappy file to be created in HDFS even when snappy
> isn't used (due to missing lib)
> ---------------------------------------------------------------------------------------------------------
>
> Key: FLUME-1200
> URL: https://issues.apache.org/jira/browse/FLUME-1200
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v1.2.0
> Environment: RHEL 6.2 64-bit
> Reporter: Will McQueen
> Assignee: Leslin (Hong Xiang Lin)
> Fix For: v1.2.0
>
> Attachments: FLUME-1200.patch
>
>
> If I use HDFSEventSink and specify the codec to be snappy, then the sink
> writes data to HDFS with the ".snappy" extension... but the content of those
> HDFS files is not in snappy format when the snappy libs aren't found. The log
> files mention this:
> 2012-05-11 19:38:49,868 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> 2012-05-11 19:38:49,868 WARN snappy.LoadSnappy: Snappy native library
> not loaded
> ...and I think it should be an error rather than a warning... the sink
> shouldn't write data at all to HDFS if it's not in the format expected by the
> config file (ie, not compressed with snappy). The config file I used is:
> agent.channels = c1
> agent.sources = r1
> agent.sinks = k1
> #
> agent.channels.c1.type = MEMORY
> #
> agent.sources.r1.channels = c1
> agent.sources.r1.type = SEQ
> #
> agent.sinks.k1.channel = c1
> agent.sinks.k1.type = LOGGER
> #
> agent.sinks.k1.channel = c1
> agent.sinks.k1.type = HDFS
> agent.sinks.k1.hdfs.path = hdfs://<host>:<port>:<path>
> agent.sinks.k1.hdfs.fileType = DataStream
> agent.sinks.k1.hdfs.codeC = SnappyCodec
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira