harshi2506 opened a new issue #1552:
URL: https://github.com/apache/hudi/issues/1552


   Hi,
   I am using DataSourceWriter for HUDI compaction. I populated 30Gb table with 
around 1 billion rows. It created around 6000 partitions each file of size 
ranging 2mb - 100mb. I tried upserting 1Gb data and it touched around 256 
paritions and rewritten 1/20th records. But the time taken is almost 1 and half 
hour. If I take out paritions and load it into default its taking 14 minutes 
but it is touching almost 1/5th of the data.
   Attached the screenshot of the jobs and also the hudi commits
   I noticed there was a big time difference between job 7 and job 8, The last 
job is submitted after 30 minutes. Can I know why is there a big time 
difference and am I missing something.
   <img width="1105" alt="Screenshot 2020-04-22 at 2 24 08 PM" 
src="https://user-images.githubusercontent.com/64137937/79974810-90b6c480-84b7-11ea-936e-fb497cf1dccb.png";>
   <img width="1229" alt="Screenshot 2020-04-22 at 2 23 47 PM" 
src="https://user-images.githubusercontent.com/64137937/79974841-9d3b1d00-84b7-11ea-919e-2934b1634e01.png";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to