[
https://issues.apache.org/jira/browse/HIVE-24683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HIVE-24683:
----------------------------------
Labels: pull-request-available (was: )
> Hadoop23Shims getFileId prone to NPE for non-existing paths
> -----------------------------------------------------------
>
> Key: HIVE-24683
> URL: https://issues.apache.org/jira/browse/HIVE-24683
> Project: Hive
> Issue Type: Bug
> Reporter: Ádám Szita
> Assignee: Ádám Szita
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> HIVE-23840 introduced the feature of reading delete deltas from LLAP cache if
> it's available. This refactor opens an opportunity for NPE to happen:
> {code:java}
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.hive.shims.Hadoop23Shims.getFileId(Hadoop23Shims.java:1410)
> at org.apache.hadoop.hive.ql.io.HdfsUtils.getFileId(HdfsUtils.java:55)
> at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineFileId(OrcEncodedDataReader.java:509)
> at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.getOrcTailForPath(OrcEncodedDataReader.java:579)
> at
> org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.getOrcTailFromCache(LlapIoImpl.java:322)
> at
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:683)
> at
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.access$500(VectorizedOrcAcidRowBatchReader.java:82)
> at
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry.<init>(VectorizedOrcAcidRowBatchReader.java:1581){code}
> ColumnizedDeleteEventRegistry infers the file name of a delete delta bucket
> by looking at the bucket number (from the corresponding split) but this file
> may not exist if no deletion happen from that particular bucket.
> Earlier this was handled by always trying to open an ORC reader on the path
> and catching FileNotFoundException. However in the refactor we first try to
> look into the cache, and for that try to retrieve a file ID first. This
> entails a getFileStatus call on HDFS which returns null for non-existing
> paths, causing the NPE eventually.
> This was later fixed by HIVE-23956, nevertheless Hadoop23Shims.getFileId
> should be refactored in a way that it's not error prone anymore.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)