Re: [PR] MDT Test framework without writing data files [hudi]

via GitHub Wed, 07 Jan 2026 22:16:55 -0800


nsivabalan commented on PR #17796:
URL: https://github.com/apache/hudi/pull/17796#issuecomment-3722209542


   Deliverable by Friday: 
   - Focus on just 1 commit to mdt. we need hfiles in latest file slices of 
MDT(files and col stats). so that we can measure best possible read latencies 
for query pruning. 
   - Ensure we can support date predicate and tenantId predicates in queries. 
   - Generate col stats records using spark engine context
   - Benchmarking script should be able to run either of writer or read 
benchmarks. 
   - Lets validate 1M files and 360 partitions. If we run into scale issues, 
atleast try to find the inclination point. for eg, can we do 100k files.
   - resources: 
      -  driver: 6 or gb. executors: 4 core 8gb. if not, 3 core 9gb. 
   
   - Disabling partition stats and other feedback comments. pavithran and vamsi 
to sync up.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] MDT Test framework without writing data files [hudi]

Reply via email to