Copilot commented on code in PR #6347:
URL: https://github.com/apache/hive/pull/6347#discussion_r2888148648
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlAggrStats.java:
##########
@@ -184,34 +185,19 @@ public List<ColumnStatisticsObj>
columnStatisticsObjForPartitions(
final double ndvTuner, final boolean enableBitVector,
boolean enableKll) throws MetaException {
final boolean areAllPartsFound = (partsFound == partNames.size());
- return Batchable.runBatched(batchSize, colNames, new Batchable<String,
ColumnStatisticsObj>() {
+ int batch = dbType.getMaxBatch(batchSize, colNames.size() +
partNames.size() + 4);
+ return Batchable.runBatched(batch, colNames, new Batchable<String,
ColumnStatisticsObj>() {
@Override
public List<ColumnStatisticsObj> run(final List<String> inputColNames)
throws MetaException {
- return columnStatisticsObjForPartitionsBatch(catName, dbName,
tableName,
- partNames, inputColNames, engine, areAllPartsFound,
- useDensityFunctionForNDVEstimation, ndvTuner, enableBitVector,
enableKll);
- }
- });
- }
-
- /**
- * Should be called with the list short enough to not trip up Oracle/etc.
- */
- private List<ColumnStatisticsObj> columnStatisticsObjForPartitionsBatch(
- String catName,
- String dbName, String tableName,
- List<String> partNames, List<String> colNames, String engine,
- boolean areAllPartsFound, boolean useDensityFunctionForNDVEstimation,
- double ndvTuner, boolean enableBitVector,
- boolean enableKll) throws MetaException {
- if (enableBitVector || enableKll) {
- return aggrStatsUseJava(catName, dbName, tableName, partNames,
- colNames, engine, areAllPartsFound,
useDensityFunctionForNDVEstimation,
- ndvTuner, enableBitVector, enableKll);
- } else {
- return aggrStatsUseDB(catName, dbName, tableName, partNames, colNames,
engine,
- useDensityFunctionForNDVEstimation, ndvTuner);
- }
+ if (enableBitVector || enableKll) {
+ return aggrStatsUseJava(catName, dbName, tableName, partNames,
+ inputColNames, engine, areAllPartsFound,
useDensityFunctionForNDVEstimation,
+ ndvTuner, enableBitVector, enableKll);
+ } else {
+ return aggrStatsUseDB(catName, dbName, tableName, partNames,
inputColNames, engine,
+ useDensityFunctionForNDVEstimation, ndvTuner);
+ }
+ }});
Review Comment:
`columnStatisticsObjForPartitions` computes a DB-specific batch size but
only applies it to batching over `colNames`. The SQL built in `aggrStatsUseDB`
uses *both* `inputColNames` and the full `partNames` list as bind parameters
(`... in (%1$s) ... in (%2$s)` plus 4 fixed params). If `batchSize` is
`NO_BATCHING` (the default for Postgres today), the unbatched `partNames` can
still exceed PostgreSQL’s 32767 parameter limit, so this change doesn’t fully
address the failure mode.
To actually enforce the parameter limit, apply batching to the `partNames`
dimension as well (or derive `inputColNames` and `inputPartNames` batch sizes
from the same per-statement parameter budget, accounting for the 4 fixed
params).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]