[
https://issues.apache.org/jira/browse/HIVE-24683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on HIVE-24683 started by Ádám Szita.
-----------------------------------------
> NPE in Hadoop23Shims due to non-existing delete delta paths
> -----------------------------------------------------------
>
> Key: HIVE-24683
> URL: https://issues.apache.org/jira/browse/HIVE-24683
> Project: Hive
> Issue Type: Bug
> Reporter: Ádám Szita
> Assignee: Ádám Szita
> Priority: Major
>
> HIVE-23840 introduced the feature of reading delete deltas from LLAP cache if
> it's available. This refactor opens an opportunity for NPE to happen:
> {code:java}
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.hive.shims.Hadoop23Shims.getFileId(Hadoop23Shims.java:1410)
> at org.apache.hadoop.hive.ql.io.HdfsUtils.getFileId(HdfsUtils.java:55)
> at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineFileId(OrcEncodedDataReader.java:509)
> at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.getOrcTailForPath(OrcEncodedDataReader.java:579)
> at
> org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.getOrcTailFromCache(LlapIoImpl.java:322)
> at
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:683)
> at
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.access$500(VectorizedOrcAcidRowBatchReader.java:82)
> at
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry.<init>(VectorizedOrcAcidRowBatchReader.java:1581){code}
> ColumnizedDeleteEventRegistry infers the file name of a delete delta bucket
> by looking at the bucket number (from the corresponding split) but this file
> may not exist if no deletion happen from that particular bucket.
> Earlier this was handled by always trying to open an ORC reader on the path
> and catching FileNotFoundException. However in the refactor we first try to
> look into the cache, and for that try to retrieve a file ID first. This
> entails a getFileStatus call on HDFS which returns null for non-existing
> paths, causing the NPE eventually.
> This needs to be wrapped around by a null check in Hadoop23Shims..
--
This message was sent by Atlassian Jira
(v8.3.4#803005)