bigdata-spec opened a new issue, #9829:
URL: https://github.com/apache/hudi/issues/9829
**Environment Description**
* Hudi version : 0.13.1
* Spark version : 3.3.2
* Hive version : 2.1.0-cdh6.3.2
* Hadoop version : 3.0.0-cdh6.3.2
* Storage (HDFS/S3/GCS..) : hdfs
* Running on Docker? (yes/no) : no
Hi,I meet an error by query hudi table.
where I run
```
select count(*),dt
from zone_dw.dws_xx_refresh_hi group by dt
```
```
2023-10-07 15:13:42.013 ERROR io.trino.spi.TrinoException: Error occurs when
executing flatMap
at
io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:276)
at
io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
at io.trino.$gen.Trino_360____20230925_110711_2.run(Unknown Source)
at
io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.hudi.exception.HoodieException: Error occurs when
executing flatMap
at
org.apache.hudi.common.function.FunctionWrapper.lambda$throwingFlatMapWrapper$1(FunctionWrapper.java:50)
at
java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:271)
at
java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at
java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:952)
at
java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:926)
at
java.base/java.util.stream.AbstractTask.compute(AbstractTask.java:327)
at
java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
at
java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at
java.base/java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:408)
at
java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736)
at
java.base/java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:919)
at
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at
java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
at
org.apache.hudi.common.engine.HoodieLocalEngineContext.flatMap(HoodieLocalEngineContext.java:118)
at
org.apache.hudi.metadata.FileSystemBackedTableMetadata.getPartitionPathWithPathPrefix(FileSystemBackedTableMetadata.java:109)
at
org.apache.hudi.metadata.FileSystemBackedTableMetadata.lambda$getPartitionPathWithPathPrefixes$0(FileSystemBackedTableMetadata.java:91)
at
java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:271)
at
java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
at
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at
java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
at
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at
java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
at
org.apache.hudi.metadata.FileSystemBackedTableMetadata.getPartitionPathWithPathPrefixes(FileSystemBackedTableMetadata.java:95)
at
org.apache.hudi.BaseHoodieTableFileIndex.listPartitionPaths(BaseHoodieTableFileIndex.java:281)
at
org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:206)
at
org.apache.hudi.BaseHoodieTableFileIndex.doRefresh(BaseHoodieTableFileIndex.java:383)
at
org.apache.hudi.BaseHoodieTableFileIndex.<init>(BaseHoodieTableFileIndex.java:159)
at
org.apache.hudi.hadoop.HiveHoodieTableFileIndex.<init>(HiveHoodieTableFileIndex.java:52)
at
org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:235)
at
org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:142)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
at
org.apache.hudi.hadoop.HoodieParquetInputFormatBase.getSplits(HoodieParquetInputFormatBase.java:68)
at
io.trino.plugin.hive.BackgroundHiveSplitLoader.lambda$loadPartition$3(BackgroundHiveSplitLoader.java:485)
at
io.trino.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25)
at io.trino.plugin.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:98)
at
io.trino.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:485)
at
io.trino.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:345)
at
io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:269)
... 6 more
Caused by: java.io.FileNotFoundException: File
hdfs://nameservice1/user/hive/warehouse/zone_dw.db/dws_xxx_refresh_hi/dt=20230906/hh=23
does not exist.
at
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1058)
at
org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131)
at
org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1118)
at
org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1115)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1125)
at
org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:270)
at
org.apache.hudi.metadata.FileSystemBackedTableMetadata.lambda$getPartitionPathWithPathPrefix$f0540b37$1(FileSystemBackedTableMetadata.java:111)
at
org.apache.hudi.common.function.FunctionWrapper.lambda$throwingFlatMapWrapper$1(FunctionWrapper.java:48)
... 46 more
```
/user/hive/warehouse/zone_dw.db/dws_xxx_refresh_hi/dt=20230906/hh=23 is not
exist.
but this sql ,hive and spark can query .
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]