[GitHub] [hudi] BalaMahesh opened a new issue #2251: [SUPPORT] select queries failing with InvalidInputException: Input path does not exist even though file is present in directory

GitBox Thu, 12 Nov 2020 21:35:09 -0800


BalaMahesh opened a new issue #2251:
URL: https://github.com/apache/hudi/issues/2251



   **_Tips before filing an issue_**
   
   - Have you gone through our 
[FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)?
   
   - Join the mailing list to engage in conversations and get faster support at 
[email protected].
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   When running the select col1,col2 ..  queries on HUDI tables , i am getting 
the error 
   
   org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
s3a//path.
   
   But if I do hdfs dfs -cat on the same file, i am able to see the data, and 
this not for all the cases, in some cases query is returning the result and in 
most of the cases it is failing .
   
   But if run select count(*),dt from _ro group by dt, it isn't throwing any 
error. Where could be the problem ?
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Ingest data with Delta streamer
   2.Query _ro table
   
   **Expected behavior**
   
   Query should return the rows.
   
   **Environment Description**
   
   * Hudi version : 0.6.1
   
   * Spark version : 2.4.7
   
   * Hive version : 1.2
   
   * Hadoop version : 2.7.1
   
   * Storage (HDFS/S3/GCS..) : s3a
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ``` org.apache.hadoop.mapred.InvalidInputException: Input path does not 
exist: s3a://xx/test/hudi/data/xx/xx/dt=2020-11-12/.hoodie_partition_metadata
        at 
org.apache.hadoop.mapred.LocatedFileStatusFetcher.getFileStatuses(LocatedFileStatusFetcher.java:155)
        at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:237)
        at 
org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:105)
        at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:307)
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:409)
        at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
   ]
   DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
   FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
vertexId=vertex_1599034786224_1149165_1_00, diagnostics=[Vertex 
vertex_1599034786224_1149165_1_00 [Map 1] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: accounting_new_zealand_crn_tracker_ro 
initializer failed, vertex=vertex_1599034786224_1149165_1_00 [Map 1], 
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
s3a://xx/test/hudi/data/xx/xx/dt=2020-11-12/.hoodie_partition_metadata
        at 
org.apache.hadoop.mapred.LocatedFileStatusFetcher.getFileStatuses(LocatedFileStatusFetcher.java:155)
        at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:237)
        at 
org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:105)
        at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:307)
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:409)
        at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
   ]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
   ```
   
   dfs -cat 
s3a://xx/test/hudi/data/xx/xx/dt=2020-11-12/.hoodie_partition_metadata;
   #partition metadata
   #Thu Nov 12 06:14:36 IST 2020
   commitTime=20201112061416
   partitionDepth=1
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] BalaMahesh opened a new issue #2251: [SUPPORT] select queries failing with InvalidInputException: Input path does not exist even though file is present in directory

Reply via email to