Arina Ielchiieva created DRILL-7419:
---------------------------------------
Summary: Enhance Drill splitting logic for compressed files
Key: DRILL-7419
URL: https://issues.apache.org/jira/browse/DRILL-7419
Project: Apache Drill
Issue Type: Improvement
Affects Versions: 1.16.0
Reporter: Arina Ielchiieva
By default Drill treats all compressed files are non splittable. Drill uses
BlockMapBuilder to split file into blocks if possible. According to its code,
it tries to split the file if blockSplittable is set to true and file IS NOT
compressed. So even if format is block splittable but came as compressed file,
it won't be split.
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/schedule/BlockMapBuilder.java#L115
But some compression codecs can be splittable, for example; bzip2
(https://i.stack.imgur.com/jpprr.jpg). Codec type should be taken into account
when considering if file can be split.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)