[jira] [Commented] (PARQUET-2134) Incorrect type checking in HadoopStreams.wrap

ASF GitHub Bot (Jira) Mon, 06 Jun 2022 10:52:06 -0700


    [ 
https://issues.apache.org/jira/browse/PARQUET-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550595#comment-17550595
 ]


ASF GitHub Bot commented on PARQUET-2134:
-----------------------------------------

steveloughran commented on code in PR #951:
URL: https://github.com/apache/parquet-mr/pull/951#discussion_r890388928


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java:
##########
@@ -51,7 +52,7 @@ public class HadoopStreams {
   public static SeekableInputStream wrap(FSDataInputStream stream) {
     Objects.requireNonNull(stream, "Cannot wrap a null input stream");
     if (byteBufferReadableClass != null && h2SeekableConstructor != null &&
-        byteBufferReadableClass.isInstance(stream.getWrappedStream())) {
+        isWrappedStreamByteBufferReadable(stream)) {

Review Comment:
   this is really going into the internals of the hadoop classes and 
potentially tricky if there is any dynamic decision making in the inner class. 
The good news there is I don't see anything doing that.
   
   there is a way to ask (hadoop 3.2+) if a stream does support the API before 
calling, using the StreamCapabilities interface.
   https://issues.apache.org/jira/browse/HDFS-14111
   
   ```java
   if (stream.hasCapability( "in:readbytebuffer") {
     // stream is confident it has the api
   ) else {
     // do the checking of the inner class
   }



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java:
##########
@@ -66,6 +67,15 @@ public static SeekableInputStream wrap(FSDataInputStream 
stream) {
     }
   }
 
+  private static boolean isWrappedStreamByteBufferReadable(FSDataInputStream 
stream) {
+    InputStream wrapped = stream.getWrappedStream();
+    if (wrapped instanceof FSDataInputStream) {
+      return isWrappedStreamByteBufferReadable(((FSDataInputStream) wrapped));

Review Comment:
   you can't. the inner stream is set in the constructor, so cannot take the 
not-yet-constructed class as an argument.





> Incorrect type checking in HadoopStreams.wrap
> ---------------------------------------------
>
>                 Key: PARQUET-2134
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2134
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.8.3, 1.10.1, 1.11.2, 1.12.2
>            Reporter: Todd Gao
>            Priority: Minor
>
> The method 
> [HadoopStreams.wrap|https://github.com/apache/parquet-mr/blob/4d062dc37577e719dcecc666f8e837843e44a9be/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java#L51]
>  wraps an FSDataInputStream to a SeekableInputStream. 
> It checks whether the underlying stream of the passed  FSDataInputStream 
> implements ByteBufferReadable: if true, wraps the FSDataInputStream to 
> H2SeekableInputStream; otherwise, wraps to H1SeekableInputStream.
> In some cases, we may add another wrapper over FSDataInputStream. For 
> example, 
> {code:java}
> class CustomDataInputStream extends FSDataInputStream {
>     public CustomDataInputStream(FSDataInputStream original) {
>         super(original);
>     }
> }
> {code}
> When we create an FSDataInputStream, whose underlying stream does not 
> implements ByteBufferReadable, and then creates a CustomDataInputStream with 
> it. If we use HadoopStreams.wrap to create a SeekableInputStream, we may get 
> an error like 
> {quote}java.lang.UnsupportedOperationException: Byte-buffer read unsupported 
> by input stream{quote}
> We can fix this by taking recursive checks over the underlying stream of 
> FSDataInputStream.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (PARQUET-2134) Incorrect type checking in HadoopStreams.wrap

Reply via email to