nsivabalan commented on PR #17796:
URL: https://github.com/apache/hudi/pull/17796#issuecomment-3722209542
Deliverable by Friday:
- Focus on just 1 commit to mdt. we need hfiles in latest file slices of
MDT(files and col stats). so that we can measure best possible read latencies
for query pruning.
- Ensure we can support date predicate and tenantId predicates in queries.
- Generate col stats records using spark engine context
- Benchmarking script should be able to run either of writer or read
benchmarks.
- Lets validate 1M files and 360 partitions. If we run into scale issues,
atleast try to find the inclination point. for eg, can we do 100k files.
- resources:
- driver: 6 or gb. executors: 4 core 8gb. if not, 3 core 9gb.
- Disabling partition stats and other feedback comments. pavithran and vamsi
to sync up.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]