[ 
https://issues.apache.org/jira/browse/HUDI-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386121#comment-17386121
 ] 

ASF GitHub Bot commented on HUDI-2214:
--------------------------------------

hudi-bot commented on pull request #3335:
URL: https://github.com/apache/hudi/pull/3335#issuecomment-885523651


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "9bebeaf2c723810d6c6d5df00e4d6f36b4f478e4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9bebeaf2c723810d6c6d5df00e4d6f36b4f478e4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9bebeaf2c723810d6c6d5df00e4d6f36b4f478e4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> residual temporary files after clustering are not cleaned up
> ------------------------------------------------------------
>
>                 Key: HUDI-2214
>                 URL: https://issues.apache.org/jira/browse/HUDI-2214
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Cleaner
>    Affects Versions: 0.8.0
>         Environment: spark3.1.1
> hadoop3.1.1
>            Reporter: tao meng
>            Assignee: tao meng
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.10.0
>
>
> residual temporary files after clustering are not cleaned up
> // test step
> step1: do clustering
> val records1 = recordsToStrings(dataGen.generateInserts("001", 1000)).toList
> val inputDF1: Dataset[Row] = 
> spark.read.json(spark.sparkContext.parallelize(records1, 2))
> inputDF1.write.format("org.apache.hudi")
>  .options(commonOpts)
>  .option(DataSourceWriteOptions.OPERATION_OPT_KEY.key(), 
> DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL)
>  .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key(), 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL)
>  // option for clustering
>  .option("hoodie.parquet.small.file.limit", "0")
>  .option("hoodie.clustering.inline", "true")
>  .option("hoodie.clustering.inline.max.commits", "1")
>  .option("hoodie.clustering.plan.strategy.target.file.max.bytes", 
> "1073741824")
>  .option("hoodie.clustering.plan.strategy.small.file.limit", "629145600")
>  .option("hoodie.clustering.plan.strategy.max.bytes.per.group", 
> Long.MaxValue.toString)
>  .option("hoodie.clustering.plan.strategy.target.file.max.bytes", 
> String.valueOf(12 *1024 * 1024L))
>  .option("hoodie.clustering.plan.strategy.sort.columns", "begin_lat, 
> begin_lon")
>  .mode(SaveMode.Overwrite)
>  .save(basePath)
> step2: check the temp dir, we find 
> /tmp/junit1835474867260509758/dataset/.hoodie/.temp/ is not empty
> {color:#FF0000}/tmp/junit1835474867260509758/dataset/.hoodie/.temp/20210723171208
>  {color}
> is not cleaned up.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to