[
https://issues.apache.org/jira/browse/DRILL-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17257915#comment-17257915
]
ASF GitHub Bot commented on DRILL-7834:
---------------------------------------
cgivre commented on pull request #2133:
URL: https://github.com/apache/drill/pull/2133#issuecomment-753733624
> @cgivre The `To EVF` work needs these good ideas. Is it possible to add a
comment for `openPossiblyCompressedStream()` function to describe the
differences of them?
@luocooong
Thanks for the review. The `openPossiblyCompressedStream()` function opens
an InputStream but if the file is compressed you can get a ZipCompressedStream
or something like that. In most cases, it won't matter, however, I found that
in the case of a proprietary plugin that I was working on which read a byte
array. I'm not sure exactly why, but the Zip stream was breaking the reader.
The plugin in question also didn't work on S3 for the same reason.
I'm working on refactoring the LTSV plugin and was running into the same
issue. Hopefully this will make future development a little easier.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Add Utility Functions for Compressed Files
> ------------------------------------------
>
> Key: DRILL-7834
> URL: https://issues.apache.org/jira/browse/DRILL-7834
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - Text & CSV
> Affects Versions: 1.17.0
> Reporter: Charles Givre
> Assignee: Charles Givre
> Priority: Minor
> Fix For: 1.19.0
>
>
> Some format plugins that use third party parsers throw errors when they
> receive compressed input streams from Drill. This PR proposes to introduce
> three utility functions to the DrillFileSystem:
> # isCompressed(<path>): Returns true/false whether the input file is
> compressed
> # getCodec(<path>): This method returns the codec of the file if any
> # openDecompressedInputStream(<path>): Returns an InputStream that should be
> readable by parsers that read raw bytes. This method converts the original
> InputStream to a byte[] first, then returns that via a ByteArrayInputStream.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)