[ 
https://issues.apache.org/jira/browse/DRILL-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956661#comment-16956661
 ] 

ASF GitHub Bot commented on DRILL-5674:
---------------------------------------

paul-rogers commented on pull request #1879: DRILL-5674: Support ZIP compression
URL: https://github.com/apache/drill/pull/1879#discussion_r337322883
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcapng/PcapngFormatPlugin.java
 ##########
 @@ -47,7 +47,7 @@ public PcapngFormatPlugin(String name, DrillbitContext 
context, Configuration fs
 
   public PcapngFormatPlugin(String name, DrillbitContext context, 
Configuration fsConf, StoragePluginConfig config, PcapngFormatConfig 
formatPluginConfig) {
     super(name, context, fsConf, config, formatPluginConfig, true,
-        false, true, false,
+        false, true, true,
 
 Review comment:
   Isn't the middle `true` wrong? It is for `blockSplittable`. That means we'll 
start reading at an arbitrary block boundary. Since this is a binary format, it 
is not clear that we can scan forward to the beginning of the next record as 
can be done in Sequence File and (restricted) CSV.
   
   Also, if the file is zip-encoded, then it is never block splittable since 
Zip files cannot be read at an arbitrary offset.
   
   This creates an issue: the block-splittable attribute right now is a 
constant. But, if any file is zip-encoded, then it is never block splittable. 
Any way to handle this fact?
   
   And, any way to test this behaviour?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill should support .zip compression
> -------------------------------------
>
>                 Key: DRILL-5674
>                 URL: https://issues.apache.org/jira/browse/DRILL-5674
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Text & CSV
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Arina Ielchiieva
>            Priority: Major
>              Labels: doc-impacting
>             Fix For: 1.17.0
>
>
> Zip is a very common compression format. Create a compressed CSV file with 
> column headers: data.csv.zip.
> Define a storage plugin config for the file, call it "dfs.myws", set 
> delimiter = ",", extract header = true, skip header = false.
> Run a simple query:
> SELECT * FROM dfs.myws.`data.csv.zip`
> The result is garbage as the CSV reader is trying to parse Zipped data as if 
> it were text.
> DRILL-5506 asks how to do this; the responder said to add a library to the 
> path. Better would be to simply support zip out-of-the-box as a default 
> format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to