Github user mlipkovich commented on a diff in the pull request:
https://github.com/apache/flink/pull/4683#discussion_r140652438
--- Diff: flink-core/pom.xml ---
@@ -52,6 +52,12 @@ under the License.
<artifactId>flink-shaded-asm</artifactId>
</dependency>
+ <dependency>
+ <groupId>org.apache.flink</groupId>
+ <artifactId>flink-shaded-hadoop2</artifactId>
+ <version>${project.version}</version>
+ </dependency>
--- End diff --
What do you think about adding this dependency to compile-time only?
Regarding to difference between codecs as I understand the thing is that
Snappy compressed files are not splittable. So Hadoop splits raw files into
blocks and compresses each block separately using regular Snappy. If you
download the whole Hadoop Snappy compressed file regular Snappy will not be
able to decompress it since it's not aware of block boundaries
---