Nabeel Sarwar created NIFI-4911:
-----------------------------------
Summary: NiFi CompressContent Snappy incompatible behavior with
Spark
Key: NIFI-4911
URL: https://issues.apache.org/jira/browse/NIFI-4911
Project: Apache NiFi
Issue Type: Bug
Affects Versions: 1.2.0
Environment: HDF 3.0.2 running on Centos
Reporter: Nabeel Sarwar
The CompressContent processor uses the SnappyOutputStream class from
snappy-java project. As listed on
[https://github.com/xerial/snappy-java|https://github.com/xerial/snappy-java,]
this output will be incompatible with org.apache.hadoop.io.compress.SnappyCodec
used for default in spark. When you try to read snappy files produced by this
processor from Spark, you will get an empty dataframe.
One can deal with the data in Spark by using the SnappyInputStream on the raw
files and not dealing with the SnappyCodec in spark, but it is not obvious at
first glance why the default doesn't work.
Is there a way to add HadoopCompatibleSnappy as an option like Snappy Framed is
offered?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)