Todd Gao created PARQUET-2134:
---------------------------------
Summary: Incorrect type checking in HadoopStreams.wrap
Key: PARQUET-2134
URL: https://issues.apache.org/jira/browse/PARQUET-2134
Project: Parquet
Issue Type: Bug
Components: parquet-mr
Reporter: Todd Gao
The method
[HadoopStreams.wrap|https://github.com/apache/parquet-mr/blob/4d062dc37577e719dcecc666f8e837843e44a9be/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java#L51]
wraps an FSDataInputStream to a SeekableInputStream.
It checks whether the underlying stream of the passed FSDataInputStream
implements ByteBufferReadable: if true, wraps the FSDataInputStream to
H2SeekableInputStream; otherwise, wraps to H1SeekableInputStream.
In some cases, we may add another wrapper over FSDataInputStream. For example,
{code:java}
class CustomDataInputStream extends FSDataInputStream {
public CustomDataInputStream(FSDataInputStream original) {
super(original);
}
}
{code}
When we create an FSDataInputStream, whose underlying stream does not
implements ByteBufferReadable, and then creates a CustomDataInputStream with
it. If we use HadoopStreams.wrap to create a SeekableInputStream, we may get an
error like
{quote}java.lang.UnsupportedOperationException: Byte-buffer read unsupported by
input stream{quote}.
We can fix this by taking recursive checks over the underlying stream of
FSDataInputStream.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)