[GitHub] [hudi] ksoullpwk opened a new issue, #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers

GitBox Fri, 29 Apr 2022 13:22:50 -0700


ksoullpwk opened a new issue, #5281:
URL: https://github.com/apache/hudi/issues/5281


   **Describe the problem you faced**
   
   `.hoodie/hoodie.properties` file can be deleted due to retention settings of 
cloud providers. Is there any configs we can set to refresh this properties 
file?
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Save data in HUDI format
   2. Set a retention date for the cloud provider's bucket
   3. Wait until more than a retention date and `.hoodie/hoodie.properties` has 
been deleted
   4. `org.apache.hudi.exception.HoodieIOException: Could not load Hoodie 
properties from {bucket}/.hoodie/hoodie.properties`
   
   **Expected behavior**
   
   We should have the way to refresh this properties file to extend the 
retention period. In my opinion, Hudi should have some ways to mitigate this 
issue by itself.
   
   **Environment Description**
   
   * Hudi version : 0.9.0
   
   * Spark version : 2.4.4
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : GCS
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   The way I mitigate this issue is to have the simple cron job to refresh 
properties (by copy data to same path), but I don't think it is the right idea.
   
   **Stacktrace**
   
   ```
   Exception in thread "main" org.apache.hudi.exception.HoodieIOException: 
Could not load Hoodie properties from {bucket}/.hoodie/hoodie.properties
        at 
org.apache.hudi.common.table.HoodieTableConfig.<init>(HoodieTableConfig.java:183)
        at 
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:114)
        at 
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:74)
        at 
org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:611)
        at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$getHoodieTableConfig$1.apply(HoodieSparkSqlWriter.scala:697)
        at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$getHoodieTableConfig$1.apply(HoodieSparkSqlWriter.scala:697)
        at scala.Option.getOrElse(Option.scala:121)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.getHoodieTableConfig(HoodieSparkSqlWriter.scala:695)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:111)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
        at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)...```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] ksoullpwk opened a new issue, #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers

Reply via email to