Nabeel Sarwar created NIFI-4911: ----------------------------------- Summary: NiFi CompressContent Snappy incompatible behavior with Spark Key: NIFI-4911 URL: https://issues.apache.org/jira/browse/NIFI-4911 Project: Apache NiFi Issue Type: Bug Affects Versions: 1.2.0 Environment: HDF 3.0.2 running on Centos Reporter: Nabeel Sarwar
The CompressContent processor uses the SnappyOutputStream class from snappy-java project. As listed on [https://github.com/xerial/snappy-java|https://github.com/xerial/snappy-java,] this output will be incompatible with org.apache.hadoop.io.compress.SnappyCodec used for default in spark. When you try to read snappy files produced by this processor from Spark, you will get an empty dataframe. One can deal with the data in Spark by using the SnappyInputStream on the raw files and not dealing with the SnappyCodec in spark, but it is not obvious at first glance why the default doesn't work. Is there a way to add HadoopCompatibleSnappy as an option like Snappy Framed is offered? -- This message was sent by Atlassian JIRA (v7.6.3#76005)