[
https://issues.apache.org/jira/browse/HADOOP-15688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589371#comment-16589371
]
Thomas Marquardt commented on HADOOP-15688:
-------------------------------------------
[^HADOOP-15688-HADOOP-15407-002.patch]
I noticed ABFS has similar issues with wrapping output stream too. I checked
with Da and we think this was an accident that occurred during an earlier
refactor, as there is no need to wrap the stream twice with FSDataInputStream
or FSDataOutputStream.
I have attached patch 002 which fixes this for all streams. All tests pass
against my US storage account:
*Tests run: 265, Failures: 0, Errors: 0, Skipped: 11*
*Tests run: 1, Failures: 0, Errors: 0, Skipped: 0*
*Tests run: 861, Failures: 0, Errors: 0, Skipped: 262*
*Tests run: 186, Failures: 0, Errors: 0, Skipped: 10*
Regarding the timeout issue, I do use an Azure VM to run the tests which helps
reduce latency, but we should make the tests pass regardless. The test cases
have various timeouts which we can increased. Just let us know which ones are
causing trouble. We are also working to improve parallelization of the tests,
which will reduce total run time.
> ABFS: InputStream wrapped in FSDataInputStream twice
> ----------------------------------------------------
>
> Key: HADOOP-15688
> URL: https://issues.apache.org/jira/browse/HADOOP-15688
> Project: Hadoop Common
> Issue Type: Sub-task
> Reporter: Sean Mackrory
> Assignee: Sean Mackrory
> Priority: Major
> Attachments: HADOOP-15688-HADOOP-15407-002.patch,
> HADOOP-15688.001.patch
>
>
> I can't read Parquet files from ABFS. It has 2 different implementations to
> read seekable streams, and it'll use the one that uses ByteBuffer reads if it
> can. It currently decides to use the ByteBuffer read implementation because
> the FSDataInputStream it gets back wraps another FSDataInputStream, which
> implements ByteBufferReadable.
> That's not the most robust way to check that ByteBufferReads are supported by
> the ultimately underlying InputStream, but it's unnecessary and probably a
> mistake to double-wrap the InputStream, so let's not.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]