[GitHub] [drill] cgivre commented on a change in pull request #2184: DRILL-7874: Drill Fails to Read File Types on S3

GitBox Tue, 02 Mar 2021 12:41:15 -0800


cgivre commented on a change in pull request #2184:
URL: https://github.com/apache/drill/pull/2184#discussion_r585893757




##########
File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
##########
@@ -125,7 +125,7 @@ public boolean next() {
 
   private void openFile(FileSchemaNegotiator negotiator) {
     try {
-      fileReaderShp = negotiator.fileSystem().open(split.getPath());
+      fileReaderShp = 
negotiator.fileSystem().openDecompressedInputStream(split.getPath());

Review comment:
       @vvysotskyi 
   Thanks for your comment.  The issue that was happening on S3 was that the 
`inputstream` that gets passed to the format plugin seemed to be compressed in 
some way, even if the file itself was not compressed. 
   This didn't seem to matter for text based file formats, but for formats that 
read binary data, such as `pcap`, the stream wasn't being decompressed and the 
result was that Drill couldn't parse the file. 
   
   The `openPossiblyCompressedStream()` method didn't really solve this issue, 
because you could end up with a compressed stream that the format plugins 
couldn't read.  I thought another approach would be to put this logic in the 
format plugins themselves, but I couldn't figure out a way to determine whether 
the stream was compressed or not after you call the 
`openPossiblyCompressedStream()`. 
   
   Do you have any suggestions as to how to fix so that we can avoid the OOM?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [drill] cgivre commented on a change in pull request #2184: DRILL-7874: Drill Fails to Read File Types on S3

Reply via email to