dengzhhu653 commented on PR #4744: URL: https://github.com/apache/hive/pull/4744#issuecomment-1902476462
> @dengzhhu653 @zhangbutao I made another benchmark test with 2 millions partitions stats written in advance, it shows no obvious performance regression: > > ```sql > mysql> select count(1) from PART_COL_STATS; > +----------+ > | count(1) | > +----------+ > | 2000000 | > +----------+ > 1 row in set (0.40 sec) > ``` > > benchmark test: > > ```shell > java -jar ./hmsbench-jar-with-dependencies.jar -H localhost --savedata /tmp/benchdata --sanitize -N 100 -N 1000 -o bench_results_direct.csv -C -d testbench_http --params=100 -E 'drop.*' -E 'renameTable.*' -E 'getTableObjectsByName.*' -E 'listTables.*' -E 'listPartitions.*' -E 'getPartitionsByNames.*' -E 'getPartitionNames.*' -E 'listPartition' -E 'getPartition' -E 'getPartitions' -E 'getPartitions.10' -E 'getPartitions.100' -E 'getPartitions.1000' -E 'addPartition.*' -E 'addPartitions.*' -E 'alterPartitions.*' -E 'getNid' -E 'listDatabases' -E 'getTable' -E 'createTable' -E 'openTxn.*' > ``` > > * before this patch > > ```shell > Operation Mean Med Min Max Err% > getPartitionsStat 5.21167 5.16801 4.92140 6.05965 3.77022 > getPartitionsStat.100 6.93186 6.83728 6.48675 10.2091 6.80759 > getPartitionsStat.1000 15.1901 14.8172 14.3164 19.6772 6.61940 > updatePartitionsStat 9.83066 9.63766 9.27253 16.3278 9.28177 > updatePartitionsStat.100 1009.46 1009.26 991.282 1052.16 0.956140 > updatePartitionsStat.1000 10091.7 10088.1 9929.50 10309.3 0.760790 > ``` > > * after this patch > > ```shell > Operation Mean Med Min Max Err% > getPartitionsStat 5.56409 5.49373 5.20583 7.02619 5.03727 > getPartitionsStat.100 6.34526 6.29966 5.97725 7.85943 4.11913 > getPartitionsStat.1000 14.2403 14.1247 13.6040 15.8745 3.02256 > updatePartitionsStat 10.5586 10.3743 9.88599 14.8948 7.01613 > updatePartitionsStat.100 1013.06 1011.71 978.329 1047.57 1.45127 > updatePartitionsStat.1000 9912.52 9905.62 9677.24 10163.9 1.22903 > ``` From what I see in `benchmarkGetPartitionsStat`, looks there is only one table, thousands of partitions and col stats, am I missing something? I guess the performance regression is caused by multiple join after removing the columns, how many databases, tables, partitions in the bench test? Thanks, Zhihua -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
