kasakrisz commented on PR #6089: URL: https://github.com/apache/hive/pull/6089#issuecomment-3575499587
@ramitg254 Thanks for reporting this bug and working on the fix. I have a few questions to get a better picture of the issue: > I ran a script which executes the queries of pattern cbo_* and query* under perf directory with 1. Did you actually execute all TPC-DS queries or just compile them? The driver `TestTezTPCDS30TBPerfCliDriver` doesn't execute the queries since the data is not available. It uses a Postgres HMS backend db dump to simulate an environment where the TPC-DS schema exists and calls Hive's SQL compiler using the `explain` and `explain cbo` commands. 2. Do the numbers you shared in the tables (Apache master version, with aggrStatsUseDB/without aggrStatsUseDB and with batching/without batching) show the overall compilation time of all queries? 3. I haven't found any new tests in this patch, no golden file changes either. Could you please provide a minimal repro of the issue? Please don't copy-paste any q file of the tpc-ds queries. IIUC this issue should be reproducible with a table having a few partitions and a batch size smaller than the number of partitions. Unit tests are also welcome. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
