[ 
https://issues.apache.org/jira/browse/HIVE-25877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-25877:
---------------------------------------
    Description: 
As part of the direct insert optimisation (same issue is there for MM table 
also, without direct insert optimisation), the files from Tez jobs are moved to 
the table directory for ACID tables. Then the duplicate removal is done. Each 
session scan through the tables and cleans up the file related to specific 
session. But the iterator is created over all the files. So the 
FileNotFoundException is thrown when multiple sessions are acting on same table 
and the first session cleans up its data which is being read by the second 
session.
{code:java}
Caused by: java.io.FileNotFoundException: File 
hdfs://mbehera-1.mbehera.root.hwx.site:8020/warehouse/tablespace/managed/hive/tbl4/_tmp.delta_0000981_0000981_0000
 does not exist.
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1275)
 ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1249)
 ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1194)
 ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1190)
 ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1208)
 ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2144) 
~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.fs.FileSystem$5.handleFileStat(FileSystem.java:2332) 
~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
        at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:2309) 
~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.hive.ql.exec.Utilities.getDirectInsertDirectoryCandidatesRecursive(Utilities.java:4447)
 ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.Utilities.getDirectInsertDirectoryCandidates(Utilities.java:4413)
 ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.metadata.Hive.getValidPartitionsInPath(Hive.java:2816)
 ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT] {code}
 

The below path is fixed by HIVE-24682
{code:java}
Caused by: java.io.FileNotFoundException: File 
hdfs://mbehera-1.mbehera.root.hwx.site:8020/warehouse/tablespace/managed/hive/tbl4/.hive-staging_hive_2022-01-19_05-18-38_933_1683918321120508074-54
 does not exist.
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1275)
 ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1249)
 ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1194)
 ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1190)
 ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1208)
 ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2144) 
~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.fs.FileSystem$5.handleFileStat(FileSystem.java:2332) 
~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
        at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:2309) 
~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.hive.ql.exec.Utilities.getDirectInsertDirectoryCandidatesRecursive(Utilities.java:4447)
 ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.Utilities.getDirectInsertDirectoryCandidates(Utilities.java:4413)
 ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.Utilities.getFullDPSpecs(Utilities.java:2971) 
~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT] {code}

  was:
As part of the direct insert optimisation (same issue is there for MM table 
also, without direct insert optimisation), the files from Tez jobs are moved to 
the table directory for ACID tables. Then the duplicate removal is done. Each 
session scan through the tables and cleans up the file related to specific 
session. But the iterator is created over all the files. So the 
FileNotFoundException is thrown when multiple sessions are acting on same table 
and the first session cleans up its data which is being read by the second 
session.
{code:java}
Caused by: java.io.FileNotFoundException: File 
hdfs://mbehera-1.mbehera.root.hwx.site:8020/warehouse/tablespace/managed/hive/tbl4/_tmp.delta_0000981_0000981_0000
 does not exist.
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1275)
 ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1249)
 ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1194)
 ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1190)
 ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1208)
 ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2144) 
~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.fs.FileSystem$5.handleFileStat(FileSystem.java:2332) 
~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
        at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:2309) 
~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.hive.ql.exec.Utilities.getDirectInsertDirectoryCandidatesRecursive(Utilities.java:4447)
 ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.Utilities.getDirectInsertDirectoryCandidates(Utilities.java:4413)
 ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.metadata.Hive.getValidPartitionsInPath(Hive.java:2816)
 ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT] {code}
 
{code:java}
Caused by: java.io.FileNotFoundException: File 
hdfs://mbehera-1.mbehera.root.hwx.site:8020/warehouse/tablespace/managed/hive/tbl4/.hive-staging_hive_2022-01-19_05-18-38_933_1683918321120508074-54
 does not exist.
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1275)
 ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1249)
 ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1194)
 ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1190)
 ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1208)
 ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2144) 
~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.fs.FileSystem$5.handleFileStat(FileSystem.java:2332) 
~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
        at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:2309) 
~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
        at 
org.apache.hadoop.hive.ql.exec.Utilities.getDirectInsertDirectoryCandidatesRecursive(Utilities.java:4447)
 ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.Utilities.getDirectInsertDirectoryCandidates(Utilities.java:4413)
 ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.Utilities.getFullDPSpecs(Utilities.java:2971) 
~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT] {code}


> Load table from concurrent thread causes FileNotFoundException
> --------------------------------------------------------------
>
>                 Key: HIVE-25877
>                 URL: https://issues.apache.org/jira/browse/HIVE-25877
>             Project: Hive
>          Issue Type: Bug
>            Reporter: mahesh kumar behera
>            Assignee: mahesh kumar behera
>            Priority: Major
>
> As part of the direct insert optimisation (same issue is there for MM table 
> also, without direct insert optimisation), the files from Tez jobs are moved 
> to the table directory for ACID tables. Then the duplicate removal is done. 
> Each session scan through the tables and cleans up the file related to 
> specific session. But the iterator is created over all the files. So the 
> FileNotFoundException is thrown when multiple sessions are acting on same 
> table and the first session cleans up its data which is being read by the 
> second session.
> {code:java}
> Caused by: java.io.FileNotFoundException: File 
> hdfs://mbehera-1.mbehera.root.hwx.site:8020/warehouse/tablespace/managed/hive/tbl4/_tmp.delta_0000981_0000981_0000
>  does not exist.
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1275)
>  ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1249)
>  ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1194)
>  ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1190)
>  ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1208)
>  ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
>         at 
> org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2144) 
> ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
>         at 
> org.apache.hadoop.fs.FileSystem$5.handleFileStat(FileSystem.java:2332) 
> ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
>         at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:2309) 
> ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
>         at 
> org.apache.hadoop.hive.ql.exec.Utilities.getDirectInsertDirectoryCandidatesRecursive(Utilities.java:4447)
>  ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.exec.Utilities.getDirectInsertDirectoryCandidates(Utilities.java:4413)
>  ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.getValidPartitionsInPath(Hive.java:2816)
>  ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT] {code}
>  
> The below path is fixed by HIVE-24682
> {code:java}
> Caused by: java.io.FileNotFoundException: File 
> hdfs://mbehera-1.mbehera.root.hwx.site:8020/warehouse/tablespace/managed/hive/tbl4/.hive-staging_hive_2022-01-19_05-18-38_933_1683918321120508074-54
>  does not exist.
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1275)
>  ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1249)
>  ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1194)
>  ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1190)
>  ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1208)
>  ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
>         at 
> org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2144) 
> ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
>         at 
> org.apache.hadoop.fs.FileSystem$5.handleFileStat(FileSystem.java:2332) 
> ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
>         at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:2309) 
> ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
>         at 
> org.apache.hadoop.hive.ql.exec.Utilities.getDirectInsertDirectoryCandidatesRecursive(Utilities.java:4447)
>  ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.exec.Utilities.getDirectInsertDirectoryCandidates(Utilities.java:4413)
>  ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.exec.Utilities.getFullDPSpecs(Utilities.java:2971) 
> ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT] {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to