hudi-bot opened a new issue, #16458: URL: https://github.com/apache/hudi/issues/16458
The set of configuration parameters for Compaction service is confusing. In HoodieCompationConfig: * hoodie.compact.inline * hoodie.compact.schedule.inline * hoodie.log.compaction.enable * hoodie.log.compaction.inline * hoodie.compact.inline.max.delta.commits * hoodie.compact.inline.max.delta.seconds * hoodie.compact.inline.trigger.strategy * hoodie.parquet.small.file.limit * hoodie.record.size.estimation.threshold * hoodie.compaction.target.io * hoodie.compaction.logfile.size.threshold * hoodie.compaction.logfile.num.threshold * hoodie.compaction.strategy * hoodie.compaction.daybased.target.partitions * hoodie.copyonwrite.insert.split.size * hoodie.copyonwrite.insert.auto.split * hoodie.copyonwrite.record.size.estimate * hoodie.log.compaction.blocks.threshold In FlinkOptions: * compaction.async.enabled * compaction.schedule.enabled * compaction.delta_commits * compaction.delta_seconds * compaction.trigger.strategy * compaction.target_io * compaction.max_memory * compaction.tasks * compaction.timeout.seconds Need to refactor naming with saving backward compatibility. ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-7646 - Type: Improvement - Fix version(s): - 1.1.0 --- ## Comments 22/Apr/24 08:11;geserdugarov;The main question is what options are preferable, with ".inline" or with ".async" naming. The current distribution is the following. Using ".inline": * hoodie.compact.inline * hoodie.compact.schedule.inline * hoodie.log.compaction.inline * hoodie.clustering.inline * hoodie.clustering.schedule.inline * hoodie.partition.ttl.inline Using ".async": * hoodie.clean.async.enabled * clean.async.enabled * compaction.async.enabled * hoodie.kafka.compaction.async.enable * hoodie.clustering.async.enabled * clustering.async.enabled * hoodie.archive.async * hoodie.embed.timeline.server.async * hoodie.metadata.index.async * hoodie.datasource.compaction.async.enable Looks like it's preferable to move toward ".async" option. And from user point of view, it's more obvious what ".async" means in comparing with ".inline", which needs to clarify the Hudi write process for a user.;;; --- 09/May/24 17:32;geserdugarov;Prepared local environment for TPC-H benchmark running. I will research Compaction parameters configuration from user point of view.;;; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
