Re: [PR] HIVE-29482: Direct SQL fails when number of query parameters exceeds … [hive]

via GitHub Wed, 04 Mar 2026 23:22:23 -0800


Copilot commented on code in PR #6347:
URL: https://github.com/apache/hive/pull/6347#discussion_r2888148648



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlAggrStats.java:
##########
@@ -184,34 +185,19 @@ public List<ColumnStatisticsObj> 
columnStatisticsObjForPartitions(
       final double ndvTuner, final boolean enableBitVector,
       boolean enableKll) throws MetaException {
     final boolean areAllPartsFound = (partsFound == partNames.size());
-    return Batchable.runBatched(batchSize, colNames, new Batchable<String, 
ColumnStatisticsObj>() {
+    int batch = dbType.getMaxBatch(batchSize, colNames.size() + 
partNames.size() + 4);
+    return Batchable.runBatched(batch, colNames, new Batchable<String, 
ColumnStatisticsObj>() {
       @Override
       public List<ColumnStatisticsObj> run(final List<String> inputColNames) 
throws MetaException {
-        return columnStatisticsObjForPartitionsBatch(catName, dbName, 
tableName,
-            partNames, inputColNames, engine, areAllPartsFound,
-            useDensityFunctionForNDVEstimation, ndvTuner, enableBitVector, 
enableKll);
-      }
-    });
-  }
-
-  /**
-   * Should be called with the list short enough to not trip up Oracle/etc.
-   */
-  private List<ColumnStatisticsObj> columnStatisticsObjForPartitionsBatch(
-      String catName,
-      String dbName, String tableName,
-      List<String> partNames, List<String> colNames, String engine,
-      boolean areAllPartsFound, boolean useDensityFunctionForNDVEstimation,
-      double ndvTuner, boolean enableBitVector,
-      boolean enableKll) throws MetaException {
-    if (enableBitVector || enableKll) {
-      return aggrStatsUseJava(catName, dbName, tableName, partNames,
-          colNames, engine, areAllPartsFound, 
useDensityFunctionForNDVEstimation,
-          ndvTuner, enableBitVector, enableKll);
-    } else {
-      return aggrStatsUseDB(catName, dbName, tableName, partNames, colNames, 
engine,
-          useDensityFunctionForNDVEstimation, ndvTuner);
-    }
+        if (enableBitVector || enableKll) {
+          return aggrStatsUseJava(catName, dbName, tableName, partNames,
+              inputColNames, engine, areAllPartsFound, 
useDensityFunctionForNDVEstimation,
+              ndvTuner, enableBitVector, enableKll);
+        } else {
+          return aggrStatsUseDB(catName, dbName, tableName, partNames, 
inputColNames, engine,
+              useDensityFunctionForNDVEstimation, ndvTuner);
+        }
+      }});

Review Comment:
   `columnStatisticsObjForPartitions` computes a DB-specific batch size but 
only applies it to batching over `colNames`. The SQL built in `aggrStatsUseDB` 
uses *both* `inputColNames` and the full `partNames` list as bind parameters 
(`... in (%1$s) ... in (%2$s)` plus 4 fixed params). If `batchSize` is 
`NO_BATCHING` (the default for Postgres today), the unbatched `partNames` can 
still exceed PostgreSQL’s 32767 parameter limit, so this change doesn’t fully 
address the failure mode.
   
   To actually enforce the parameter limit, apply batching to the `partNames` 
dimension as well (or derive `inputColNames` and `inputPartNames` batch sizes 
from the same per-statement parameter budget, accounting for the 4 fixed 
params).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-29482: Direct SQL fails when number of query parameters exceeds … [hive]

Reply via email to