[jira] [Updated] (HIVE-24683) Hadoop23Shims getFileId prone to NPE for non-existing paths

ASF GitHub Bot (Jira) Tue, 26 Jan 2021 05:39:05 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-24683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated HIVE-24683:
----------------------------------
    Labels: pull-request-available  (was: )

> Hadoop23Shims getFileId prone to NPE for non-existing paths
> -----------------------------------------------------------
>
>                 Key: HIVE-24683
>                 URL: https://issues.apache.org/jira/browse/HIVE-24683
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Ádám Szita
>            Assignee: Ádám Szita
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-23840 introduced the feature of reading delete deltas from LLAP cache if 
> it's available. This refactor opens an opportunity for NPE to happen:
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.getFileId(Hadoop23Shims.java:1410)
> at org.apache.hadoop.hive.ql.io.HdfsUtils.getFileId(HdfsUtils.java:55)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineFileId(OrcEncodedDataReader.java:509)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.getOrcTailForPath(OrcEncodedDataReader.java:579)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.getOrcTailFromCache(LlapIoImpl.java:322)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:683)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.access$500(VectorizedOrcAcidRowBatchReader.java:82)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry.<init>(VectorizedOrcAcidRowBatchReader.java:1581){code}
> ColumnizedDeleteEventRegistry infers the file name of a delete delta bucket 
> by looking at the bucket number (from the corresponding split) but this file 
> may not exist if no deletion happen from that particular bucket.
> Earlier this was handled by always trying to open an ORC reader on the path 
> and catching FileNotFoundException. However in the refactor we first try to 
> look into the cache, and for that try to retrieve a file ID first. This 
> entails a getFileStatus call on HDFS which returns null for non-existing 
> paths, causing the NPE eventually.
> This was later fixed by HIVE-23956, nevertheless Hadoop23Shims.getFileId 
> should be refactored in a way that it's not error prone anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24683) Hadoop23Shims getFileId prone to NPE for non-existing paths

Reply via email to