rubenssoto opened a new issue #1957:
URL: https://github.com/apache/hudi/issues/1957


   Hi Guys,
   
   Sometimes my Hudi upserts take so long, this table job used to run in less 
than 2 minutes and I have this behavior for all tables, I think it is a delete 
operation to remove old commits, but why take so long? only 20mb files
   
   <img width="1680" alt="Captura de Tela 2020-08-12 às 15 58 08" 
src="https://user-images.githubusercontent.com/36298331/90056538-6c286780-dcb5-11ea-88b3-3382b25721a4.png";>
   <img width="1680" alt="Captura de Tela 2020-08-12 às 15 57 38" 
src="https://user-images.githubusercontent.com/36298331/90056549-6e8ac180-dcb5-11ea-9694-95b185fc2362.png";>
   <img width="1671" alt="Captura de Tela 2020-08-12 às 15 57 21" 
src="https://user-images.githubusercontent.com/36298331/90056550-6fbbee80-dcb5-11ea-9d05-6e4c6b2e096e.png";>
   
   
   
   hudi_options = {
       'hoodie.table.name': table_name,
       'hoodie.datasource.write.recordkey.field': 'id',
       'hoodie.datasource.write.table.name': table_name,
       'hoodie.datasource.write.operation': 'upsert',
       'hoodie.combine.before.upsert': 'true',
       'hoodie.datasource.write.precombine.field': 'LineCreatedTimestamp',
       'hoodie.parquet.small.file.limit': 200000000,
       'hoodie.parquet.max.file.size': 256000000,
       'hoodie.parquet.block.size': 256000000,
       'hoodie.cleaner.commits.retained': 10,
       'hoodie.datasource.hive_sync.enable': 'true',
       'hoodie.datasource.hive_sync.table': table_name,
       'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.keygen.NonpartitionedKeyGenerator',
       'hoodie.datasource.hive_sync.database': 'datalake_raw',
       'hoodie.datasource.hive_sync.jdbcurl': 
'jdbc:hive2://ip-10-0-94-214.us-west-2.compute.internal:10000',
       'hoodie.copyonwrite.record.size.estimate': 512,
       'hoodie.insert.shuffle.parallelism': 10,
       'hoodie.upsert.shuffle.parallelism': 10
   }


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to