ksoullpwk opened a new issue, #5281:
URL: https://github.com/apache/hudi/issues/5281
**Describe the problem you faced**
`.hoodie/hoodie.properties` file can be deleted due to retention settings of
cloud providers. Is there any configs we can set to refresh this properties
file?
**To Reproduce**
Steps to reproduce the behavior:
1. Save data in HUDI format
2. Set a retention date for the cloud provider's bucket
3. Wait until more than a retention date and `.hoodie/hoodie.properties` has
been deleted
4. `org.apache.hudi.exception.HoodieIOException: Could not load Hoodie
properties from {bucket}/.hoodie/hoodie.properties`
**Expected behavior**
We should have the way to refresh this properties file to extend the
retention period. In my opinion, Hudi should have some ways to mitigate this
issue by itself.
**Environment Description**
* Hudi version : 0.9.0
* Spark version : 2.4.4
* Hive version :
* Hadoop version :
* Storage (HDFS/S3/GCS..) : GCS
* Running on Docker? (yes/no) : no
**Additional context**
The way I mitigate this issue is to have the simple cron job to refresh
properties (by copy data to same path), but I don't think it is the right idea.
**Stacktrace**
```
Exception in thread "main" org.apache.hudi.exception.HoodieIOException:
Could not load Hoodie properties from {bucket}/.hoodie/hoodie.properties
at
org.apache.hudi.common.table.HoodieTableConfig.<init>(HoodieTableConfig.java:183)
at
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:114)
at
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:74)
at
org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:611)
at
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$getHoodieTableConfig$1.apply(HoodieSparkSqlWriter.scala:697)
at
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$getHoodieTableConfig$1.apply(HoodieSparkSqlWriter.scala:697)
at scala.Option.getOrElse(Option.scala:121)
at
org.apache.hudi.HoodieSparkSqlWriter$.getHoodieTableConfig(HoodieSparkSqlWriter.scala:695)
at
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:111)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
at
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)...```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]