Arina Ielchiieva created DRILL-7419:
---------------------------------------

             Summary: Enhance Drill splitting logic for compressed files
                 Key: DRILL-7419
                 URL: https://issues.apache.org/jira/browse/DRILL-7419
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.16.0
            Reporter: Arina Ielchiieva


By default Drill treats all compressed files are non splittable. Drill uses 
BlockMapBuilder to split file into blocks if possible. According to its code, 
it tries to split the file if blockSplittable is set to true and file IS NOT 
compressed. So even if format is block splittable but came as compressed file, 
it won't be split.
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/schedule/BlockMapBuilder.java#L115

But some compression codecs can be splittable, for example; bzip2 
(https://i.stack.imgur.com/jpprr.jpg). Codec type should be taken into account 
when considering if file can be split.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to