rubenssoto opened a new issue #1957: URL: https://github.com/apache/hudi/issues/1957
Hi Guys, Sometimes my Hudi upserts take so long, this table job used to run in less than 2 minutes and I have this behavior for all tables, I think it is a delete operation to remove old commits, but why take so long? only 20mb files <img width="1680" alt="Captura de Tela 2020-08-12 às 15 58 08" src="https://user-images.githubusercontent.com/36298331/90056538-6c286780-dcb5-11ea-88b3-3382b25721a4.png"> <img width="1680" alt="Captura de Tela 2020-08-12 às 15 57 38" src="https://user-images.githubusercontent.com/36298331/90056549-6e8ac180-dcb5-11ea-9694-95b185fc2362.png"> <img width="1671" alt="Captura de Tela 2020-08-12 às 15 57 21" src="https://user-images.githubusercontent.com/36298331/90056550-6fbbee80-dcb5-11ea-9d05-6e4c6b2e096e.png"> hudi_options = { 'hoodie.table.name': table_name, 'hoodie.datasource.write.recordkey.field': 'id', 'hoodie.datasource.write.table.name': table_name, 'hoodie.datasource.write.operation': 'upsert', 'hoodie.combine.before.upsert': 'true', 'hoodie.datasource.write.precombine.field': 'LineCreatedTimestamp', 'hoodie.parquet.small.file.limit': 200000000, 'hoodie.parquet.max.file.size': 256000000, 'hoodie.parquet.block.size': 256000000, 'hoodie.cleaner.commits.retained': 10, 'hoodie.datasource.hive_sync.enable': 'true', 'hoodie.datasource.hive_sync.table': table_name, 'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.NonpartitionedKeyGenerator', 'hoodie.datasource.hive_sync.database': 'datalake_raw', 'hoodie.datasource.hive_sync.jdbcurl': 'jdbc:hive2://ip-10-0-94-214.us-west-2.compute.internal:10000', 'hoodie.copyonwrite.record.size.estimate': 512, 'hoodie.insert.shuffle.parallelism': 10, 'hoodie.upsert.shuffle.parallelism': 10 } ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
