[GitHub] [hudi] jklim96 commented on issue #7254: [SUPPORT] Incremental query performance

GitBox Wed, 23 Nov 2022 17:13:46 -0800


jklim96 commented on issue #7254:
URL: https://github.com/apache/hudi/issues/7254#issuecomment-1325826748


   For the 10 commit query which the incremental query took longer for, I've 
manually checked the files and confirmed that ~1600 files have been touched out 
of a total of ~14000 files. so the incremental query should theoretically be 
~9x faster than the filter, yet we're seeing performance of incremental queries 
worse than the filter.
   
   From the [Hudi 
documentation](https://hudi.apache.org/docs/faq/#what-performance-can-i-expect-for-hudi-readingqueries):
   >For incremental views, you can expect speed up relative to how much data 
usually changes in a given time window and how much time your entire scan 
takes. For e.g, if only 100 files changed in the last hour in a partition of 
1000 files, then you can expect a speed of 10x using incremental pull in Hudi 
compared to full scanning the partition to find out new data.
   
   What's being explained in the documentation isn't quite the behaviour we're 
seeing.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] jklim96 commented on issue #7254: [SUPPORT] Incremental query performance

Reply via email to