BalaMahesh opened a new issue #2251: URL: https://github.com/apache/hudi/issues/2251
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? - Join the mailing list to engage in conversations and get faster support at [email protected]. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** When running the select col1,col2 .. queries on HUDI tables , i am getting the error org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: s3a//path. But if I do hdfs dfs -cat on the same file, i am able to see the data, and this not for all the cases, in some cases query is returning the result and in most of the cases it is failing . But if run select count(*),dt from _ro group by dt, it isn't throwing any error. Where could be the problem ? **To Reproduce** Steps to reproduce the behavior: 1. Ingest data with Delta streamer 2.Query _ro table **Expected behavior** Query should return the rows. **Environment Description** * Hudi version : 0.6.1 * Spark version : 2.4.7 * Hive version : 1.2 * Hadoop version : 2.7.1 * Storage (HDFS/S3/GCS..) : s3a * Running on Docker? (yes/no) : no **Additional context** Add any other context about the problem here. **Stacktrace** ``` org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: s3a://xx/test/hudi/data/xx/xx/dt=2020-11-12/.hoodie_partition_metadata at org.apache.hadoop.mapred.LocatedFileStatusFetcher.getFileStatuses(LocatedFileStatusFetcher.java:155) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:237) at org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:105) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:307) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:409) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ] DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1599034786224_1149165_1_00, diagnostics=[Vertex vertex_1599034786224_1149165_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: accounting_new_zealand_crn_tracker_ro initializer failed, vertex=vertex_1599034786224_1149165_1_00 [Map 1], org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: s3a://xx/test/hudi/data/xx/xx/dt=2020-11-12/.hoodie_partition_metadata at org.apache.hadoop.mapred.LocatedFileStatusFetcher.getFileStatuses(LocatedFileStatusFetcher.java:155) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:237) at org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:105) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:307) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:409) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 ``` dfs -cat s3a://xx/test/hudi/data/xx/xx/dt=2020-11-12/.hoodie_partition_metadata; #partition metadata #Thu Nov 12 06:14:36 IST 2020 commitTime=20201112061416 partitionDepth=1 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
