moritzmeister opened a new issue #4939:
URL: https://github.com/apache/hudi/issues/4939
**To Reproduce**
Steps to reproduce the behavior:
1. Create an empty directory in HDFS and create the corresponding Hive table
2. Try to write data to that table using HudiDeltaStreamer
**Expected behavior**
I would expect that the Delta streamer creates the `.hoodie` metadata
directory if it doesn't exist, the same way as the Spark Hudi DataSource does
it.
Traversing the stacktrace, I saw that the
`HoodieTableMetaClient.initTableAndGetMetaClient()` or
`HoodieTableMetaClient.initTable()` is not called, which I would create the
metadata directory if it's is not existing yet.
Is there a reason why that's the case? I suppose in Spark that's what's
happening and why it works there.
Can I call the `initTable` manually, or am I then missing other steps also?
**Environment Description**
* Hudi version : 0.10
* Spark version : 3.1.1
* Hive version : 3.0.0
* Hadoop version : 3
* Storage (HDFS/S3/GCS..) : HDFS
**Stacktrace**
```org.apache.hudi.exception.TableNotFoundException: Hoodie table not found
in path
hdfs://10.0.2.15:8020/apps/hive/warehouse/delta_streamer.db/card_transactions_10m_agg_15/.hoodie
at
org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:57)
at
org.apache.hudi.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:113)
at
org.apache.hudi.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:73)
at
org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:614)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.(HoodieDeltaStreamer.java:581)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.(HoodieDeltaStreamer.java:143)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.(HoodieDeltaStreamer.java:115)
at
com.logicalclocks.hsfs.engine.hudi.DeltaStreamerConfig.streamToHoodieTable(DeltaStreamerConfig.java:95)
at
com.logicalclocks.hsfs.engine.hudi.HudiEngine.streamToHoodieTable(HudiEngine.java:264)
at
com.logicalclocks.hsfs.engine.SparkEngine.streamToHudiTable(SparkEngine.java:601)
at com.logicalclocks.utils.MainClass.main(MainClass.java:81)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:732)
Caused by: java.io.FileNotFoundException: File does not exist:
hdfs://10.0.2.15:8020/apps/hive/warehouse/delta_streamer.db/card_transactions_10m_agg_15/.hoodie
at
org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1338)
at
org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1330)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1346)
at
org.apache.hudi.common.fs.HoodieWrapperFileSystem.lambda$getFileStatus$17(HoodieWrapperFileSystem.java:393)
at
org.apache.hudi.common.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:100)
at
org.apache.hudi.common.fs.HoodieWrapperFileSystem.getFileStatus(HoodieWrapperFileSystem.java:387)
at
org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:51)
... 15 more```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]