Re: [I] [SUPPORT] The cleaning service takes a long time [hudi]

via GitHub Thu, 25 Jul 2024 09:17:39 -0700


nbeeee commented on issue #11680:
URL: https://github.com/apache/hudi/issues/11680#issuecomment-2250867126


   @ad1happy2go 
   The data source is Apache Kudu, which incrementally writes data to the Hudi 
table through Spark
   here is my hudi configs
   hoodie.metadata.record.index.enable=true
   hoodie.index.type=RECORD_INDEX
   hoodie.cleaner.policy=KEEP_LATEST_FILE_VERSIONS
   hoodie.cleaner.fileversions.retained=10
   hoodie.cleaner.commits.retained=10
   hoodie.datasource.write.hive_style_partitioning=true
   hoodie.combine.before.upsert=false
   hoodie.spark.sql.insert.into.operation=upsert
   hoodie.sql.insert.mode=upsert
   hoodie.datasource.meta.sync.enable=true
   hoodie.datasource.hive_sync.mode=HMS
   
hoodie.datasource.hive_sync.metastore.uris=thrift://node1:9083,thrift://node2:9083
   hoodie.clean.automatic=true
   hoodie.clean.async=true
   hoodie.upsert.shuffle.parallelism=50
   hoodie.insert.shuffle.parallelism=50
   hoodie.delete.shuffle.parallelism=50
   hoodie.metadata.record.index.min.filegroup.count=100
   hoodie.metadata.max.reader.memory=2147483648
   hoodie.metadata.max.reader.buffer.size=20971520
   hoodie.write.tagged.record.storage.level=DISK_ONLY
   hoodie.write.status.storage.level=DISK_ONLY
   hoodie.record.index.input.storage.level=DISK_ONLY
   hoodie.metadata.compact.max.delta.commits=50
   hoodie.memory.compaction.max.size=3221225472
   hoodie.memory.merge.max.size=3221225472


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [SUPPORT] The cleaning service takes a long time [hudi]

Reply via email to