Nabeel Sarwar created NIFI-4911:
-----------------------------------

             Summary: NiFi CompressContent Snappy incompatible behavior with 
Spark
                 Key: NIFI-4911
                 URL: https://issues.apache.org/jira/browse/NIFI-4911
             Project: Apache NiFi
          Issue Type: Bug
    Affects Versions: 1.2.0
         Environment: HDF 3.0.2 running on Centos
            Reporter: Nabeel Sarwar


The CompressContent processor uses the SnappyOutputStream class from 
snappy-java project. As listed on 
[https://github.com/xerial/snappy-java|https://github.com/xerial/snappy-java,] 
this output will be incompatible with org.apache.hadoop.io.compress.SnappyCodec 
used for default in spark. When you try to read snappy files produced by this 
processor from Spark, you will get an empty dataframe. 

One can deal with the data in Spark by using the SnappyInputStream on the raw 
files and not dealing with the SnappyCodec in spark, but it is not obvious at 
first glance why the default doesn't work.

Is there a way to add HadoopCompatibleSnappy as an option like Snappy Framed is 
offered?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to