[ https://issues.apache.org/jira/browse/HUDI-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HUDI-2214: --------------------------------- Labels: pull-request-available (was: ) > residual temporary files after clustering are not cleaned up > ------------------------------------------------------------ > > Key: HUDI-2214 > URL: https://issues.apache.org/jira/browse/HUDI-2214 > Project: Apache Hudi > Issue Type: Bug > Components: Cleaner > Affects Versions: 0.8.0 > Environment: spark3.1.1 > hadoop3.1.1 > Reporter: tao meng > Assignee: tao meng > Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > residual temporary files after clustering are not cleaned up > // test step > step1: do clustering > val records1 = recordsToStrings(dataGen.generateInserts("001", 1000)).toList > val inputDF1: Dataset[Row] = > spark.read.json(spark.sparkContext.parallelize(records1, 2)) > inputDF1.write.format("org.apache.hudi") > .options(commonOpts) > .option(DataSourceWriteOptions.OPERATION_OPT_KEY.key(), > DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL) > .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key(), > DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL) > // option for clustering > .option("hoodie.parquet.small.file.limit", "0") > .option("hoodie.clustering.inline", "true") > .option("hoodie.clustering.inline.max.commits", "1") > .option("hoodie.clustering.plan.strategy.target.file.max.bytes", > "1073741824") > .option("hoodie.clustering.plan.strategy.small.file.limit", "629145600") > .option("hoodie.clustering.plan.strategy.max.bytes.per.group", > Long.MaxValue.toString) > .option("hoodie.clustering.plan.strategy.target.file.max.bytes", > String.valueOf(12 *1024 * 1024L)) > .option("hoodie.clustering.plan.strategy.sort.columns", "begin_lat, > begin_lon") > .mode(SaveMode.Overwrite) > .save(basePath) > step2: check the temp dir, we find > /tmp/junit1835474867260509758/dataset/.hoodie/.temp/ is not empty > {color:#FF0000}/tmp/junit1835474867260509758/dataset/.hoodie/.temp/20210723171208 > {color} > is not cleaned up. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)