[ 
https://issues.apache.org/jira/browse/FLUME-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393556#comment-13393556
 ] 

Leslin (Hong Xiang Lin) commented on FLUME-1200:
------------------------------------------------

Below is the final implement:
1). if user set hdfs.codeC when hdfs.fileType = DataStream, sink will use 
DataStream to output file, which is no compress extension like .snappy. Warning 
message is added to show the codeC will be ignored. 
2). Pre-check will make sure that codec is required when fileType is set 
CompressedStream.

I tested with following scenarios:
1. compressStream without codec  --> there will be exception
agent.sinks.k1.hdfs.fileType = CompressedStream
#agent.sinks.k1.hdfs.codeC = DefaultCodec

12/06/17 22:46:35 INFO sink.DefaultSinkFactory: Creating instance of sink k1 
typeHDFS
12/06/17 22:46:35 ERROR properties.PropertiesFileConfigurationProvider: Failed 
to load configuration data. Exception follows.
java.lang.NullPointerException: It's essential to set compress codec when 
fileType is: CompressedStream
        at 
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
        at 
org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:221)

2. Works fine, output file with .deflate extension.
agent.sinks.k1.hdfs.fileType = CompressedStream
agent.sinks.k1.hdfs.codeC = DefaultCodec

3. Works fine, output file without compress extension.
agent.sinks.k1.hdfs.fileType = CompressedStream
agent.sinks.k1.hdfs.codeC = snappyCodec

4. There is warning, output file without compress extension. 
agent.sinks.k1.hdfs.fileType = DataStream 
#agent.sinks.k1.hdfs.codeC = snappyCodec
12/06/17 23:08:44 INFO snappy.LoadSnappy: Snappy native library loaded
12/06/17 23:08:44 WARN hdfs.HDFSEventSink: CodeC: snappyCodec is ignored as 
fileType: DataStream is uncompressed. To change fileType if want output 
compressed.
12/06/17 23:08:44 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false

5. works fine, output file with .snappy extension
agent.sinks.k1.hdfs.fileType = SequenceFile 
agent.sinks.k1.hdfs.codeC = snappyCodec

6. Works fine, output file without .snappy extension
agent.sinks.k1.hdfs.fileType = SequenceFile
#agent.sinks.k1.hdfs.codeC = snappyCodec
                
> HDFSEventSink causes *.snappy file to be created in HDFS even when snappy 
> isn't used (due to missing lib)
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: FLUME-1200
>                 URL: https://issues.apache.org/jira/browse/FLUME-1200
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.2.0
>         Environment: RHEL 6.2 64-bit
>            Reporter: Will McQueen
>            Assignee: Leslin (Hong Xiang Lin)
>             Fix For: v1.2.0
>
>         Attachments: FLUME-1200.patch
>
>
> If I use HDFSEventSink and specify the codec to be snappy, then the sink 
> writes data to HDFS with the ".snappy" extension... but the content of those 
> HDFS files is not in snappy format when the snappy libs aren't found. The log 
> files mention this:
>      2012-05-11 19:38:49,868 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
>      2012-05-11 19:38:49,868 WARN snappy.LoadSnappy: Snappy native library 
> not loaded
> ...and I think it should be an error rather than a warning... the sink 
> shouldn't write data at all to HDFS if it's not in the format expected by the 
> config file (ie, not compressed with snappy). The config file I used is:
> agent.channels = c1
> agent.sources = r1
> agent.sinks = k1
> #
> agent.channels.c1.type = MEMORY
> #
> agent.sources.r1.channels = c1
> agent.sources.r1.type = SEQ
> #
> agent.sinks.k1.channel = c1
> agent.sinks.k1.type = LOGGER
> #
> agent.sinks.k1.channel = c1
> agent.sinks.k1.type = HDFS
> agent.sinks.k1.hdfs.path = hdfs://<host>:<port>:<path>
> agent.sinks.k1.hdfs.fileType = DataStream
> agent.sinks.k1.hdfs.codeC = SnappyCodec

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to