Re: [PR] HIVE-29482: Direct SQL fails when number of query parameters exceeds … [hive]

via GitHub Wed, 04 Mar 2026 22:32:14 -0800


Copilot commented on code in PR #6347:
URL: https://github.com/apache/hive/pull/6347#discussion_r2887932956



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:
##########
@@ -1025,12 +1024,12 @@ private List<Partition> getPartitionsByNames(String 
catName, String dbName,
         + " left outer join " + SERDES + " on " + SDS + ".\"SERDE_ID\" = " + 
SERDES + ".\"SERDE_ID\" "
         + " inner join " + TBLS + " on " + TBLS + ".\"TBL_ID\" = " + 
PARTITIONS + ".\"TBL_ID\" "
         + " inner join " + DBS + " on " + DBS + ".\"DB_ID\" = " + TBLS + 
".\"DB_ID\" "
-        + " where \"PART_NAME\" in (" + quotedPartNames + ") "
+        + " where \"PART_NAME\" in (" + makeParams(partNameList.size()) + ") "
         + " and " + TBLS + ".\"TBL_NAME\" = ? and " + DBS + ".\"NAME\" = ? and 
" + DBS
         + ".\"CTLG_NAME\" = ? order by \"PART_NAME\" asc";
-
-    Object[] params = new Object[]{tblName, dbName, catName};
-    return getPartitionsByQuery(catName, dbName, tblName, queryText, params, 
isAcidTable, args);
+    List params = new ArrayList<>(partNameList);

Review Comment:
   `List params = new ArrayList<>(partNameList);` uses a raw type even though 
the list is intended to hold parameters of a known type (all are `String` 
here). Use a typed collection (e.g., `List<Object>` or `List<String>`) to avoid 
rawtype warnings and keep the parameter construction type-safe.
   ```suggestion
       List<String> params = new ArrayList<>(partNameList);
   ```



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlAggrStats.java:
##########
@@ -184,34 +185,19 @@ public List<ColumnStatisticsObj> 
columnStatisticsObjForPartitions(
       final double ndvTuner, final boolean enableBitVector,
       boolean enableKll) throws MetaException {
     final boolean areAllPartsFound = (partsFound == partNames.size());
-    return Batchable.runBatched(batchSize, colNames, new Batchable<String, 
ColumnStatisticsObj>() {
+    int batch = dbType.getMaxBatch(batchSize, colNames.size() + 
partNames.size() + 4);
+    return Batchable.runBatched(batch, colNames, new Batchable<String, 
ColumnStatisticsObj>() {
       @Override
       public List<ColumnStatisticsObj> run(final List<String> inputColNames) 
throws MetaException {
-        return columnStatisticsObjForPartitionsBatch(catName, dbName, 
tableName,
-            partNames, inputColNames, engine, areAllPartsFound,
-            useDensityFunctionForNDVEstimation, ndvTuner, enableBitVector, 
enableKll);
-      }
-    });
-  }
-
-  /**
-   * Should be called with the list short enough to not trip up Oracle/etc.
-   */
-  private List<ColumnStatisticsObj> columnStatisticsObjForPartitionsBatch(
-      String catName,
-      String dbName, String tableName,
-      List<String> partNames, List<String> colNames, String engine,
-      boolean areAllPartsFound, boolean useDensityFunctionForNDVEstimation,
-      double ndvTuner, boolean enableBitVector,
-      boolean enableKll) throws MetaException {
-    if (enableBitVector || enableKll) {
-      return aggrStatsUseJava(catName, dbName, tableName, partNames,
-          colNames, engine, areAllPartsFound, 
useDensityFunctionForNDVEstimation,
-          ndvTuner, enableBitVector, enableKll);
-    } else {
-      return aggrStatsUseDB(catName, dbName, tableName, partNames, colNames, 
engine,
-          useDensityFunctionForNDVEstimation, ndvTuner);
-    }
+        if (enableBitVector || enableKll) {
+          return aggrStatsUseJava(catName, dbName, tableName, partNames,
+              colNames, engine, areAllPartsFound, 
useDensityFunctionForNDVEstimation,
+              ndvTuner, enableBitVector, enableKll);
+        } else {
+          return aggrStatsUseDB(catName, dbName, tableName, partNames, 
colNames, engine,

Review Comment:
   The batching callback ignores the per-batch `inputColNames` and instead 
passes the full `colNames` list into `aggrStatsUseJava/aggrStatsUseDB`. This 
defeats batching (can still exceed the DB parameter limit) and will duplicate 
results because each batch run processes all columns again. Use `inputColNames` 
in the per-batch query/aggregation paths (and only compute the batch size based 
on the parameters for that single batch).
   ```suggestion
                 inputColNames, engine, areAllPartsFound, 
useDensityFunctionForNDVEstimation,
                 ndvTuner, enableBitVector, enableKll);
           } else {
             return aggrStatsUseDB(catName, dbName, tableName, partNames, 
inputColNames, engine,
   ```



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java:
##########
@@ -852,10 +852,23 @@ public int getMaxRows(int batch, int paramSize) {
       // SQL Server supports a maximum of 2100 parameters in a request. Adjust 
the maxRowsInBatch accordingly
       int maxAllowedRows = (2100 - paramSize) / paramSize;
       return Math.min(batch, maxAllowedRows);
+    } else if (isPOSTGRES()) {
+      int maxAllowedRows = (32767 - paramSize) / paramSize;
+      return Math.min(batch, maxAllowedRows);
     }
     return batch;
   }
 
+  public int getMaxBatch(int batch, int totalParameters) {
+    int minBatch = batch;
+    if (isSQLSERVER()) {
+       minBatch = (totalParameters + 2100) / 2100;
+    } else if (isPOSTGRES()) {
+       minBatch = (totalParameters + 32767) / 32767;
+    }
+    return batch <= 0 ? minBatch : Math.max(batch, minBatch);

Review Comment:
   `getMaxBatch` currently computes a value that looks like a *number of 
batches* (ceil(totalParameters/limit)) and then is used as the 
`Batchable.runBatched` *batch size* throughout the PR. For `batchSize == 
NO_BATCHING (-1)` this will often return `1`, turning what used to be a single 
query into N single-item queries (major performance regression) and still not 
enforcing the per-statement parameter limit. Rework this helper to return a 
safe *batch size* (max items per batch) based on the DB parameter limit and 
fixed/variable parameter counts, and ensure it never returns 0.
   ```suggestion
       // totalParameters is treated as the per-item parameter count. Compute 
the maximum
       // number of items we can safely include in a single batch without 
exceeding the
       // database's per-statement parameter limit.
       int maxItemsPerBatch;
   
       if (totalParameters <= 0) {
         // If we cannot determine per-item parameter usage, fall back to the 
configured
         // batch size without applying DB-specific limits.
         return batch;
       }
   
       if (isSQLSERVER()) {
         // SQL Server supports a maximum of 2100 parameters in a request.
         maxItemsPerBatch = 2100 / totalParameters;
       } else if (isPOSTGRES()) {
         // PostgreSQL supports a maximum of 32767 parameters in a request.
         maxItemsPerBatch = 32767 / totalParameters;
       } else {
         // No known hard parameter limit; use the configured batch size as-is.
         return batch;
       }
   
       // Ensure we never return 0; at least one item per batch is always 
allowed.
       if (maxItemsPerBatch <= 0) {
         maxItemsPerBatch = 1;
       }
   
       // If batch <= 0 (e.g., NO_BATCHING), choose the largest safe batch size 
that
       // respects the DB parameter limit. Otherwise, cap the requested batch 
size at
       // the safe maximum.
       if (batch <= 0) {
         return maxItemsPerBatch;
       }
   
       return Math.min(batch, maxItemsPerBatch);
   ```



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/MetaToolObjectStore.java:
##########
@@ -842,20 +845,21 @@ private void collectTabFormatSummary(List<Long> 
transactionalTables, Long tableI
 
   private void collectBasicStats(Map<Long, MetadataTableSummary> summaries, 
Set<Long> nonPartedTabs,
       Set<Long> partedTabs) throws MetaException {
+    int batch = dbType.getMaxBatch(batchSize, nonPartedTabs.size());
     String queryText0 = "select \"TBL_ID\", \"PARAM_KEY\", CAST(" + 
dbType.toVarChar("\"PARAM_VALUE\"") + " AS decimal(21,0)) from \"TABLE_PARAMS\" 
where \"PARAM_KEY\" " +
         "in ('" + StatsSetupConst.TOTAL_SIZE + "', '" + 
StatsSetupConst.NUM_FILES + "', '" + StatsSetupConst.ROW_COUNT + "') and 
\"TBL_ID\" in (";
-    runBatched(batchSize, new ArrayList<>(nonPartedTabs), new Batchable<Long, 
Void>() {
+    runBatched(batch, new ArrayList<>(nonPartedTabs), new Batchable<Long, 
Void>() {
       @Override
       public List<Void> run(List<Long> input) throws Exception {
         collectBasicStats(queryText0, input, summaries, "");
         return Collections.emptyList();
       }
     });
-
+   batch = dbType.getMaxBatch(batchSize, partedTabs.size());
    String queryText1 = "select \"TBL_ID\", \"PARAM_KEY\", sum(CAST(" + 
dbType.toVarChar("\"PARAM_VALUE\"") + " AS decimal(21,0))) from \"PARTITIONS\" 
t " +
        "join \"PARTITION_PARAMS\" p on p.\"PART_ID\" = t.\"PART_ID\" where 
\"PARAM_KEY\" " +
        "in ('" + StatsSetupConst.TOTAL_SIZE + "', '" + 
StatsSetupConst.NUM_FILES + "', '" + StatsSetupConst.ROW_COUNT + "') and 
t.\"TBL_ID\" in (";
-   runBatched(batchSize, new ArrayList<>(partedTabs), new Batchable<Long, 
Void>() {
+   runBatched(batch, new ArrayList<>(partedTabs), new Batchable<Long, Void>() {
       @Override

Review Comment:
   Indentation is inconsistent for the second half of the method (`batch = 
...`, `String queryText1 = ...`, and the following `runBatched` call). If this 
module is checked by checkstyle/spotless, this will likely fail formatting 
checks. Align these lines with the surrounding 4-space indentation used in this 
class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-29482: Direct SQL fails when number of query parameters exceeds … [hive]

Reply via email to