[
https://issues.apache.org/jira/browse/PARQUET-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550595#comment-17550595
]
ASF GitHub Bot commented on PARQUET-2134:
-----------------------------------------
steveloughran commented on code in PR #951:
URL: https://github.com/apache/parquet-mr/pull/951#discussion_r890388928
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java:
##########
@@ -51,7 +52,7 @@ public class HadoopStreams {
public static SeekableInputStream wrap(FSDataInputStream stream) {
Objects.requireNonNull(stream, "Cannot wrap a null input stream");
if (byteBufferReadableClass != null && h2SeekableConstructor != null &&
- byteBufferReadableClass.isInstance(stream.getWrappedStream())) {
+ isWrappedStreamByteBufferReadable(stream)) {
Review Comment:
this is really going into the internals of the hadoop classes and
potentially tricky if there is any dynamic decision making in the inner class.
The good news there is I don't see anything doing that.
there is a way to ask (hadoop 3.2+) if a stream does support the API before
calling, using the StreamCapabilities interface.
https://issues.apache.org/jira/browse/HDFS-14111
```java
if (stream.hasCapability( "in:readbytebuffer") {
// stream is confident it has the api
) else {
// do the checking of the inner class
}
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java:
##########
@@ -66,6 +67,15 @@ public static SeekableInputStream wrap(FSDataInputStream
stream) {
}
}
+ private static boolean isWrappedStreamByteBufferReadable(FSDataInputStream
stream) {
+ InputStream wrapped = stream.getWrappedStream();
+ if (wrapped instanceof FSDataInputStream) {
+ return isWrappedStreamByteBufferReadable(((FSDataInputStream) wrapped));
Review Comment:
you can't. the inner stream is set in the constructor, so cannot take the
not-yet-constructed class as an argument.
> Incorrect type checking in HadoopStreams.wrap
> ---------------------------------------------
>
> Key: PARQUET-2134
> URL: https://issues.apache.org/jira/browse/PARQUET-2134
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Affects Versions: 1.8.3, 1.10.1, 1.11.2, 1.12.2
> Reporter: Todd Gao
> Priority: Minor
>
> The method
> [HadoopStreams.wrap|https://github.com/apache/parquet-mr/blob/4d062dc37577e719dcecc666f8e837843e44a9be/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java#L51]
> wraps an FSDataInputStream to a SeekableInputStream.
> It checks whether the underlying stream of the passed FSDataInputStream
> implements ByteBufferReadable: if true, wraps the FSDataInputStream to
> H2SeekableInputStream; otherwise, wraps to H1SeekableInputStream.
> In some cases, we may add another wrapper over FSDataInputStream. For
> example,
> {code:java}
> class CustomDataInputStream extends FSDataInputStream {
> public CustomDataInputStream(FSDataInputStream original) {
> super(original);
> }
> }
> {code}
> When we create an FSDataInputStream, whose underlying stream does not
> implements ByteBufferReadable, and then creates a CustomDataInputStream with
> it. If we use HadoopStreams.wrap to create a SeekableInputStream, we may get
> an error like
> {quote}java.lang.UnsupportedOperationException: Byte-buffer read unsupported
> by input stream{quote}
> We can fix this by taking recursive checks over the underlying stream of
> FSDataInputStream.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)