[ https://issues.apache.org/jira/browse/PARQUET-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550595#comment-17550595 ]
ASF GitHub Bot commented on PARQUET-2134: ----------------------------------------- steveloughran commented on code in PR #951: URL: https://github.com/apache/parquet-mr/pull/951#discussion_r890388928 ########## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java: ########## @@ -51,7 +52,7 @@ public class HadoopStreams { public static SeekableInputStream wrap(FSDataInputStream stream) { Objects.requireNonNull(stream, "Cannot wrap a null input stream"); if (byteBufferReadableClass != null && h2SeekableConstructor != null && - byteBufferReadableClass.isInstance(stream.getWrappedStream())) { + isWrappedStreamByteBufferReadable(stream)) { Review Comment: this is really going into the internals of the hadoop classes and potentially tricky if there is any dynamic decision making in the inner class. The good news there is I don't see anything doing that. there is a way to ask (hadoop 3.2+) if a stream does support the API before calling, using the StreamCapabilities interface. https://issues.apache.org/jira/browse/HDFS-14111 ```java if (stream.hasCapability( "in:readbytebuffer") { // stream is confident it has the api ) else { // do the checking of the inner class } ########## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java: ########## @@ -66,6 +67,15 @@ public static SeekableInputStream wrap(FSDataInputStream stream) { } } + private static boolean isWrappedStreamByteBufferReadable(FSDataInputStream stream) { + InputStream wrapped = stream.getWrappedStream(); + if (wrapped instanceof FSDataInputStream) { + return isWrappedStreamByteBufferReadable(((FSDataInputStream) wrapped)); Review Comment: you can't. the inner stream is set in the constructor, so cannot take the not-yet-constructed class as an argument. > Incorrect type checking in HadoopStreams.wrap > --------------------------------------------- > > Key: PARQUET-2134 > URL: https://issues.apache.org/jira/browse/PARQUET-2134 > Project: Parquet > Issue Type: Bug > Components: parquet-mr > Affects Versions: 1.8.3, 1.10.1, 1.11.2, 1.12.2 > Reporter: Todd Gao > Priority: Minor > > The method > [HadoopStreams.wrap|https://github.com/apache/parquet-mr/blob/4d062dc37577e719dcecc666f8e837843e44a9be/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java#L51] > wraps an FSDataInputStream to a SeekableInputStream. > It checks whether the underlying stream of the passed FSDataInputStream > implements ByteBufferReadable: if true, wraps the FSDataInputStream to > H2SeekableInputStream; otherwise, wraps to H1SeekableInputStream. > In some cases, we may add another wrapper over FSDataInputStream. For > example, > {code:java} > class CustomDataInputStream extends FSDataInputStream { > public CustomDataInputStream(FSDataInputStream original) { > super(original); > } > } > {code} > When we create an FSDataInputStream, whose underlying stream does not > implements ByteBufferReadable, and then creates a CustomDataInputStream with > it. If we use HadoopStreams.wrap to create a SeekableInputStream, we may get > an error like > {quote}java.lang.UnsupportedOperationException: Byte-buffer read unsupported > by input stream{quote} > We can fix this by taking recursive checks over the underlying stream of > FSDataInputStream. -- This message was sent by Atlassian Jira (v8.20.7#820007)