[
https://issues.apache.org/jira/browse/HADOOP-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385373#comment-17385373
]
Steve Loughran commented on HADOOP-17812:
-----------------------------------------
stack
{code}
17:22:13 Caused by: java.lang.NullPointerException
17:22:13 at
org.apache.hadoop.fs.s3a.S3AInputStream.lambda$read$3(S3AInputStream.java:450)
17:22:13 at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
17:22:13 at
org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:265)
17:22:13 at
org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:322)
17:22:13 at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:261)
17:22:13 at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:236)
17:22:13 at
org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:446)
17:22:13 at java.io.DataInputStream.readFully(DataInputStream.java:195)
17:22:13 at
com.nvidia.spark.rapids.ParquetPartitionReaderBase.copyDataRange(GpuParquetScan.scala:509)
17:22:13 at
com.nvidia.spark.rapids.ParquetPartitionReaderBase.copyDataRange$(GpuParquetScan.scala:497)
17:22:13 at
com.nvidia.spark.rapids.MultiFileCloudParquetPartitionReader.copyDataRange(GpuParquetScan.scala:975)
17:22:13 at
com.nvidia.spark.rapids.ParquetPartitionReaderBase.$anonfun$copyBlocksData$3(GpuParquetScan.scala:580)
17:22:13 at
com.nvidia.spark.rapids.ParquetPartitionReaderBase.$anonfun$copyBlocksData$3$adapted(GpuParquetScan.scala:580)
17:22:13 at
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
{code}
> NPE when reading s3a
> --------------------
>
> Key: HADOOP-17812
> URL: https://issues.apache.org/jira/browse/HADOOP-17812
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Reporter: Bobby Wang
> Priority: Major
> Labels: pull-request-available
> Time Spent: 50m
> Remaining Estimate: 0h
>
> when [reading from S3a
> storage|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L450],
> SSLException (which extends IOException) happens, which will trigger
> [onReadFailure|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L458].
> onReadFailure calls "reopen". it will first close the original
> *wrappedStream* and set *wrappedStream = null*, and then it will try to
> [re-get
> *wrappedStream*|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L184].
> But what if the previous code [obtaining
> S3Object|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L183]
> throw exception, then "wrappedStream" will be null.
> And the
> [retry|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L446]
> mechanism may re-execute the
> [wrappedStream.read|https://github.com/apache/hadoop/blob/rel/release-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L450]
> and cause NPE.
>
> For more details, please refer to
> [https://github.com/NVIDIA/spark-rapids/issues/2915]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]