[ 
https://issues.apache.org/jira/browse/FLUME-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393556#comment-13393556
 ] 

Leslin (Hong Xiang Lin) edited comment on FLUME-1200 at 6/18/12 2:19 PM:
-------------------------------------------------------------------------

Below is the final implement:
1). if user set hdfs.codeC when hdfs.fileType = DataStream, sink will use 
DataStream to output file, which is no compress extension like .snappy. Warning 
message is added to show the codeC will be ignored. 
2). Pre-check will make sure that codec is required when fileType is set 
CompressedStream.

After carefully consider, last comment "(2) if user set hdfs.codeC while 
fileType is CompressedStream, but codec class is unavailable" is another 
problem and should be tracked with independent JIRA. BTW, there is another 
problem with codec. I will address them in new JIRA together. 

I tested with following scenarios:
1. compressStream without codec  --> there will be exception
agent.sinks.k1.hdfs.fileType = CompressedStream
#agent.sinks.k1.hdfs.codeC = DefaultCodec

12/06/17 22:46:35 INFO sink.DefaultSinkFactory: Creating instance of sink k1 
typeHDFS
12/06/17 22:46:35 ERROR properties.PropertiesFileConfigurationProvider: Failed 
to load configuration data. Exception follows.
java.lang.NullPointerException: It's essential to set compress codec when 
fileType is: CompressedStream
        at 
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
        at 
org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:221)

2. Works fine, output file with .deflate extension.
agent.sinks.k1.hdfs.fileType = CompressedStream
agent.sinks.k1.hdfs.codeC = DefaultCodec

3. Works fine, output file without compress extension.
agent.sinks.k1.hdfs.fileType = CompressedStream
agent.sinks.k1.hdfs.codeC = snappyCodec

4. There is warning, output file without compress extension. 
agent.sinks.k1.hdfs.fileType = DataStream 
#agent.sinks.k1.hdfs.codeC = snappyCodec
12/06/17 23:08:44 INFO snappy.LoadSnappy: Snappy native library loaded
12/06/17 23:08:44 WARN hdfs.HDFSEventSink: CodeC: snappyCodec is ignored as 
fileType: DataStream is uncompressed. To change fileType if want output 
compressed.
12/06/17 23:08:44 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false

5. works fine, output file with .snappy extension
agent.sinks.k1.hdfs.fileType = SequenceFile 
agent.sinks.k1.hdfs.codeC = snappyCodec

6. Works fine, output file without .snappy extension
agent.sinks.k1.hdfs.fileType = SequenceFile
#agent.sinks.k1.hdfs.codeC = snappyCodec
                
      was (Author: leslin123):
    Below is the final implement:
1). if user set hdfs.codeC when hdfs.fileType = DataStream, sink will use 
DataStream to output file, which is no compress extension like .snappy. Warning 
message is added to show the codeC will be ignored. 
2). Pre-check will make sure that codec is required when fileType is set 
CompressedStream.

I tested with following scenarios:
1. compressStream without codec  --> there will be exception
agent.sinks.k1.hdfs.fileType = CompressedStream
#agent.sinks.k1.hdfs.codeC = DefaultCodec

12/06/17 22:46:35 INFO sink.DefaultSinkFactory: Creating instance of sink k1 
typeHDFS
12/06/17 22:46:35 ERROR properties.PropertiesFileConfigurationProvider: Failed 
to load configuration data. Exception follows.
java.lang.NullPointerException: It's essential to set compress codec when 
fileType is: CompressedStream
        at 
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
        at 
org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:221)

2. Works fine, output file with .deflate extension.
agent.sinks.k1.hdfs.fileType = CompressedStream
agent.sinks.k1.hdfs.codeC = DefaultCodec

3. Works fine, output file without compress extension.
agent.sinks.k1.hdfs.fileType = CompressedStream
agent.sinks.k1.hdfs.codeC = snappyCodec

4. There is warning, output file without compress extension. 
agent.sinks.k1.hdfs.fileType = DataStream 
#agent.sinks.k1.hdfs.codeC = snappyCodec
12/06/17 23:08:44 INFO snappy.LoadSnappy: Snappy native library loaded
12/06/17 23:08:44 WARN hdfs.HDFSEventSink: CodeC: snappyCodec is ignored as 
fileType: DataStream is uncompressed. To change fileType if want output 
compressed.
12/06/17 23:08:44 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false

5. works fine, output file with .snappy extension
agent.sinks.k1.hdfs.fileType = SequenceFile 
agent.sinks.k1.hdfs.codeC = snappyCodec

6. Works fine, output file without .snappy extension
agent.sinks.k1.hdfs.fileType = SequenceFile
#agent.sinks.k1.hdfs.codeC = snappyCodec
                  
> HDFSEventSink causes *.snappy file to be created in HDFS even when snappy 
> isn't used (due to missing lib)
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: FLUME-1200
>                 URL: https://issues.apache.org/jira/browse/FLUME-1200
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.2.0
>         Environment: RHEL 6.2 64-bit
>            Reporter: Will McQueen
>            Assignee: Leslin (Hong Xiang Lin)
>             Fix For: v1.2.0
>
>         Attachments: FLUME-1200.patch
>
>
> If I use HDFSEventSink and specify the codec to be snappy, then the sink 
> writes data to HDFS with the ".snappy" extension... but the content of those 
> HDFS files is not in snappy format when the snappy libs aren't found. The log 
> files mention this:
>      2012-05-11 19:38:49,868 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
>      2012-05-11 19:38:49,868 WARN snappy.LoadSnappy: Snappy native library 
> not loaded
> ...and I think it should be an error rather than a warning... the sink 
> shouldn't write data at all to HDFS if it's not in the format expected by the 
> config file (ie, not compressed with snappy). The config file I used is:
> agent.channels = c1
> agent.sources = r1
> agent.sinks = k1
> #
> agent.channels.c1.type = MEMORY
> #
> agent.sources.r1.channels = c1
> agent.sources.r1.type = SEQ
> #
> agent.sinks.k1.channel = c1
> agent.sinks.k1.type = LOGGER
> #
> agent.sinks.k1.channel = c1
> agent.sinks.k1.type = HDFS
> agent.sinks.k1.hdfs.path = hdfs://<host>:<port>:<path>
> agent.sinks.k1.hdfs.fileType = DataStream
> agent.sinks.k1.hdfs.codeC = SnappyCodec

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to