SamarthRaval opened a new issue, #8925: URL: https://github.com/apache/hudi/issues/8925
Hello Guys, all my deltacommits are being written < 1hr but so much time is being wasted in deleting marker directory[shown in screenshot], but never got proper understanding why exactly it is happening ? My configuration are as below: ```hoodie.datasource.hive_sync.database -> prod_hudi_tier2, hoodie.datasource.hive_sync.mode -> hms, hoodie.datasource.hive_sync.support_timestamp -> true, path -> <s3://transactions.all>_hudi, hoodie.datasource.write.precombine.field -> lastmodifieddate, hoodie.datasource.hive_sync.partition_fields -> warehouse,year,month, hoodie.datasource.write.payload.class -> com.verafin.spark.datalake.NullSafeDefaultHoodieRecordPayload, hoodie.datasource.hive_sync.skip_ro_suffix -> true, hoodie.metadata.enable -> true, hoodie.datasource.hive_sync.table -> transactions_all, hoodie.datasource.meta_sync.condition.sync -> true, hoodie.clean.automatic -> false, hoodie.datasource.write.operation -> upsert, hoodie.datasource.hive_sync.enable -> true, hoodie.datasource.write.recordkey.field -> uuid, hoodie.table.name -> transactions_all, hoodie.datasource.write.table.type -> MERGE_ON_READ, hoodie.datasource.write.hive_style_partitioning -> true, hoodie.datasource.write.reconcile.schema -> true, hoodie.datasource.write.keygenerator.class -> org.apache.hudi.keygen.ComplexKeyGenerator, hoodie.upsert.shuffle.parallelism -> 5760, hoodie.meta.sync.client.tool.class -> org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool, hoodie.datasource.write.partitionpath.field -> warehouse,year,month, hoodie.compact.inline.max.delta.commits -> 25``` I am also storing in AWS glue if that is creating problem, no idea ? Or may be metadata is taking so much time ? This is slowing down entire pipeline. [Slack Message](https://apache-hudi.slack.com/archives/C4D716NPQ/p1685759631473149?thread_ts=1685759631.473149&cid=C4D716NPQ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
