harshi2506 opened a new issue #1552: URL: https://github.com/apache/hudi/issues/1552
Hi, I am using DataSourceWriter for HUDI compaction. I populated 30Gb table with around 1 billion rows. It created around 6000 partitions each file of size ranging 2mb - 100mb. I tried upserting 1Gb data and it touched around 256 paritions and rewritten 1/20th records. But the time taken is almost 1 and half hour. If I take out paritions and load it into default its taking 14 minutes but it is touching almost 1/5th of the data. Attached the screenshot of the jobs and also the hudi commits I noticed there was a big time difference between job 7 and job 8, The last job is submitted after 30 minutes. Can I know why is there a big time difference and am I missing something. <img width="1105" alt="Screenshot 2020-04-22 at 2 24 08 PM" src="https://user-images.githubusercontent.com/64137937/79974810-90b6c480-84b7-11ea-936e-fb497cf1dccb.png"> <img width="1229" alt="Screenshot 2020-04-22 at 2 23 47 PM" src="https://user-images.githubusercontent.com/64137937/79974841-9d3b1d00-84b7-11ea-919e-2934b1634e01.png"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
