[GitHub] [drill] vvysotskyi commented on a change in pull request #2186: DRILL-7874: Ensure DrillFSDataInputStream.read populates byte array of the requested length

GitBox Sat, 06 Mar 2021 08:12:13 -0800


vvysotskyi commented on a change in pull request #2186:
URL: https://github.com/apache/drill/pull/2186#discussion_r588899058




##########
File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystem.java
##########
@@ -788,51 +785,18 @@ public void removeXAttr(final Path path, final String 
name) throws IOException {
 
   /**
    * Returns an InputStream from a Hadoop path. If the data is compressed, 
this method will return a compressed
-   * InputStream depending on the codec.  Note that if the results of this 
method are sent to a third party parser
-   * that works with bytes or individual characters directly, you should use 
the openDecompressedInputStream method.
+   * InputStream depending on the codec.
    * @param path Input file path
    * @return InputStream of opened file path
    * @throws IOException If the file is unreachable, unavailable or otherwise 
unreadable
    */
   public InputStream openPossiblyCompressedStream(Path path) throws 
IOException {
-    CompressionCodec codec = codecFactory.getCodec(path); // infers from file 
ext.
+    CompressionCodec codec = getCodec(path); // infers from file ext.
+    InputStream inputStream = open(path);

Review comment:
       Sorry, but I don't see how it reduces memory usage. When calling the 
`codec.createInputStream(open(path))`, Java will call the `open(path)` method, 
and after that will call the `createInputStream()` with its result, so these 
two versions of the code actually behave in the same way.

##########
File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFSDataInputStream.java
##########
@@ -223,12 +224,28 @@ public int read(byte[] b, int off, int len) throws 
IOException {
     public int read(byte[] b) throws IOException {
       operatorStats.startWait();
       try {
-        return is.read(b);
+        return readBytes(b, 0, b.length);
       } finally {
         operatorStats.stopWait();
       }
     }
 
+    /**
+     * Reads up to {@code len} bytes of data from the input stream into an 
array of bytes.
+     * This method guarantees that regardless of the underlying stream 
implementation,
+     * the byte array will be populated with either {@code len} bytes or
+     * all available in stream bytes if they are less than {@code len}.
+     */
+    private int readBytes(byte[] b, int off, int len) throws IOException {

Review comment:
       `openDecompressedInputStream` doesn't behave as we want. For the case of 
non-compressed stream, it simply calls `open(path)`, so the issue will still be 
present. For the case of compressed stream, it converts all stream data to the 
byte array, and after that converts it to the `ByteArrayInputStream`, so we 
don't observe this issue, but such an approach may cause OOM, since the file 
may be too large.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [drill] vvysotskyi commented on a change in pull request #2186: DRILL-7874: Ensure DrillFSDataInputStream.read populates byte array of the requested length

Reply via email to