moritzmeister opened a new issue #4939:
URL: https://github.com/apache/hudi/issues/4939


   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Create an empty directory in HDFS and create the corresponding Hive table
   2. Try to write data to that table using HudiDeltaStreamer
   
   **Expected behavior**
   
   I would expect that the Delta streamer creates the `.hoodie` metadata 
directory if it doesn't exist, the same way as the Spark Hudi DataSource does 
it.
   
   Traversing the stacktrace, I saw that the 
`HoodieTableMetaClient.initTableAndGetMetaClient()` or 
`HoodieTableMetaClient.initTable()` is not called, which I would create the 
metadata directory if it's is not existing yet.
   
   Is there a reason why that's the case? I suppose in Spark that's what's 
happening and why it works there.
   Can I call the `initTable` manually, or am I then missing other steps also?
   
   **Environment Description**
   
   * Hudi version : 0.10
   
   * Spark version : 3.1.1
   
   * Hive version : 3.0.0
   
   * Hadoop version : 3
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   **Stacktrace**
   
   ```org.apache.hudi.exception.TableNotFoundException: Hoodie table not found 
in path 
hdfs://10.0.2.15:8020/apps/hive/warehouse/delta_streamer.db/card_transactions_10m_agg_15/.hoodie
        at 
org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:57)
        at 
org.apache.hudi.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:113)
        at 
org.apache.hudi.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:73)
        at 
org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:614)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.(HoodieDeltaStreamer.java:581)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.(HoodieDeltaStreamer.java:143)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.(HoodieDeltaStreamer.java:115)
        at 
com.logicalclocks.hsfs.engine.hudi.DeltaStreamerConfig.streamToHoodieTable(DeltaStreamerConfig.java:95)
        at 
com.logicalclocks.hsfs.engine.hudi.HudiEngine.streamToHoodieTable(HudiEngine.java:264)
        at 
com.logicalclocks.hsfs.engine.SparkEngine.streamToHudiTable(SparkEngine.java:601)
        at com.logicalclocks.utils.MainClass.main(MainClass.java:81)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:732)
   Caused by: java.io.FileNotFoundException: File does not exist: 
hdfs://10.0.2.15:8020/apps/hive/warehouse/delta_streamer.db/card_transactions_10m_agg_15/.hoodie
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1338)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1330)
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1346)
        at 
org.apache.hudi.common.fs.HoodieWrapperFileSystem.lambda$getFileStatus$17(HoodieWrapperFileSystem.java:393)
        at 
org.apache.hudi.common.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:100)
        at 
org.apache.hudi.common.fs.HoodieWrapperFileSystem.getFileStatus(HoodieWrapperFileSystem.java:387)
        at 
org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:51)
        ... 15 more```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to