Copilot commented on code in PR #6198:
URL: https://github.com/apache/hive/pull/6198#discussion_r2985740454


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java:
##########
@@ -10109,19 +10146,34 @@ public boolean deleteTableColumnStatistics(String 
catName, String dbName, String
     if (tableName == null) {
       throw new InvalidInputException("Table name is null.");
     }
+    final List<String> cols;
+    if (colNames != null && !colNames.isEmpty()) {
+      List<String> normalizedColNames = new ArrayList<>();
+      for (String colName : colNames){
+        if (StringUtils.isEmpty(colName)) {
+          continue;
+        }
+        // trim the extra spaces, and change to lowercase
+        normalizedColNames.add(normalizeIdentifier(colName));
+      }
+      cols = normalizedColNames;
+    } else {
+      cols = new ArrayList<>();
+    }

Review Comment:
   When `colNames` is non-empty but all entries are blank, the normalization 
loop produces an empty `cols` list; later code treats an empty list as "delete 
stats for all columns". This can unintentionally drop all stats for inputs like 
`[""]`. Consider throwing `InvalidInputException` if the caller supplied column 
names but none are valid after normalization.



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/directsql/DirectSqlDeleteStats.java:
##########
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore.directsql;
+
+import javax.jdo.PersistenceManager;
+import javax.jdo.datastore.JDOConnection;
+import java.sql.Connection;
+import java.sql.PreparedStatement;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.hive.common.StatsSetupConst;
+import org.apache.hadoop.hive.metastore.Batchable;
+import org.apache.hadoop.hive.metastore.QueryWrapper;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static 
org.apache.hadoop.hive.metastore.directsql.MetaStoreDirectSql.getIdListForIn;
+import static 
org.apache.hadoop.hive.metastore.directsql.MetastoreDirectSqlUtils.closeDbConn;
+import static 
org.apache.hadoop.hive.metastore.directsql.MetastoreDirectSqlUtils.executeWithArray;
+import static 
org.apache.hadoop.hive.metastore.directsql.MetastoreDirectSqlUtils.extractSqlClob;
+import static 
org.apache.hadoop.hive.metastore.directsql.MetastoreDirectSqlUtils.makeParams;
+
+public class DirectSqlDeleteStats {
+  private static final Logger LOG = 
LoggerFactory.getLogger(DirectSqlDeleteStats.class);
+  private final MetaStoreDirectSql directSql;
+  private final PersistenceManager pm;
+  private final int batchSize;
+
+  public DirectSqlDeleteStats(MetaStoreDirectSql directSql, PersistenceManager 
pm) {
+    this.directSql = directSql;
+    this.pm = pm;
+    this.batchSize = directSql.getDirectSqlBatchSize();
+  }
+
+  public boolean deletePartitionColumnStats(String catName, String dbName, 
String tblName,
+      List<String> partNames, List<String> colNames, String engine) throws 
MetaException {
+    List<Long> partIds = Batchable.runBatched(batchSize, partNames, new 
Batchable<String, Long>() {
+      @Override
+      public List<Long> run(List<String> input) throws Exception {
+        String sqlFilter = "\"PARTITIONS\".\"PART_NAME\" in  (" + 
makeParams(input.size()) + ")";
+        List<Long> partitionIds = 
directSql.getPartitionIdsViaSqlFilter(catName, dbName, tblName, sqlFilter,
+            input, Collections.emptyList(), -1);
+        if (!partitionIds.isEmpty()) {
+          String deleteSql = "delete from \"PART_COL_STATS\" where \"PART_ID\" 
in ( " + getIdListForIn(partitionIds) + ")";
+          List<Object> params = new ArrayList<>(colNames == null ? 1 : 
colNames.size() + 1);
+
+          if (colNames != null && !colNames.isEmpty()) {
+            deleteSql += " and \"COLUMN_NAME\" in (" + 
makeParams(colNames.size()) + ")";
+            params.addAll(colNames);
+          }
+
+          if (engine != null) {
+            deleteSql += " and \"ENGINE\" = ?";
+            params.add(engine);
+          }
+          try (QueryWrapper queryParams = new 
QueryWrapper(pm.newQuery("javax.jdo.query.SQL", deleteSql))) {
+            executeWithArray(queryParams.getInnerQuery(), params.toArray(), 
deleteSql);
+          }
+        }
+        return partitionIds;
+      }
+    });
+    try {
+      return updateColumnStatsAccurateForPartitions(partIds, colNames);
+    } catch (SQLException e) {
+      LOG.error("Fail to update the partitions' column accurate status", e);
+      throw new MetaException("Updating partitions's column accurate throws: " 
+ e.getMessage());

Review Comment:
   The log and exception messages here have grammatical issues and are a bit 
unclear. Consider changing to something like "Failed to update partitions' 
COLUMN_STATS_ACCURATE" and include enough context (db/table/partition count) to 
debug; also prefer preserving the SQLException as the cause when wrapping in 
MetaException so the stack trace isn’t lost.
   ```suggestion
         String errorMsg = String.format(
             "Failed to update partitions' COLUMN_STATS_ACCURATE for catalog 
'%s', database '%s', table '%s' "
                 + "(partitionCount=%d, columnCount=%d)",
             catName, dbName, tblName,
             partIds == null ? 0 : partIds.size(),
             colNames == null ? 0 : colNames.size());
         LOG.error(errorMsg, e);
         MetaException metaException = new MetaException(errorMsg + ": " + 
e.getMessage());
         metaException.initCause(e);
         throw metaException;
   ```



##########
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java:
##########
@@ -1849,6 +1849,11 @@ public void testColumnStatistics() throws Throwable {
       List<ColumnStatisticsObj> stats = client.getTableColumnStatistics(
           dbName, tblName, Lists.newArrayList(colName[1]), ENGINE);
       assertTrue("stats are not empty: " + stats, stats.isEmpty());
+      // test if all columns are deleted from parameter COLUMN_STATS_ACCURATE
+      Map<String, String> tableParams = client.getTable(dbName, 
tblName).getParameters();
+      String table_column_stats_accurate = 
tableParams.get(StatsSetupConst.COLUMN_STATS_ACCURATE);
+      assertTrue("parameter COLUMN_STATS_ACCURATE is not accurate in " + 
tblName, table_column_stats_accurate == null ||
+              (!table_column_stats_accurate.contains(colName[0]) && 
!table_column_stats_accurate.contains(colName[1])));

Review Comment:
   `COLUMN_STATS_ACCURATE` is a JSON payload; checking it via 
`String.contains(...)` is brittle (substring collisions, formatting changes, 
etc.). Prefer parsing via `StatsSetupConst.getColumnsHavingStats(tableParams)` 
and asserting the returned list does/doesn’t contain the expected column names.
   ```suggestion
         List<String> colsWithStats = 
StatsSetupConst.getColumnsHavingStats(tableParams);
         assertTrue("parameter COLUMN_STATS_ACCURATE is not accurate in " + 
tblName,
             colsWithStats == null || (!colsWithStats.contains(colName[0]) && 
!colsWithStats.contains(colName[1])));
   ```



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/directsql/DirectSqlDeleteStats.java:
##########
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore.directsql;
+
+import javax.jdo.PersistenceManager;
+import javax.jdo.datastore.JDOConnection;
+import java.sql.Connection;
+import java.sql.PreparedStatement;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.hive.common.StatsSetupConst;
+import org.apache.hadoop.hive.metastore.Batchable;
+import org.apache.hadoop.hive.metastore.QueryWrapper;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static 
org.apache.hadoop.hive.metastore.directsql.MetaStoreDirectSql.getIdListForIn;
+import static 
org.apache.hadoop.hive.metastore.directsql.MetastoreDirectSqlUtils.closeDbConn;
+import static 
org.apache.hadoop.hive.metastore.directsql.MetastoreDirectSqlUtils.executeWithArray;
+import static 
org.apache.hadoop.hive.metastore.directsql.MetastoreDirectSqlUtils.extractSqlClob;
+import static 
org.apache.hadoop.hive.metastore.directsql.MetastoreDirectSqlUtils.makeParams;
+
+public class DirectSqlDeleteStats {
+  private static final Logger LOG = 
LoggerFactory.getLogger(DirectSqlDeleteStats.class);
+  private final MetaStoreDirectSql directSql;
+  private final PersistenceManager pm;
+  private final int batchSize;
+
+  public DirectSqlDeleteStats(MetaStoreDirectSql directSql, PersistenceManager 
pm) {
+    this.directSql = directSql;
+    this.pm = pm;
+    this.batchSize = directSql.getDirectSqlBatchSize();
+  }
+
+  public boolean deletePartitionColumnStats(String catName, String dbName, 
String tblName,
+      List<String> partNames, List<String> colNames, String engine) throws 
MetaException {
+    List<Long> partIds = Batchable.runBatched(batchSize, partNames, new 
Batchable<String, Long>() {
+      @Override
+      public List<Long> run(List<String> input) throws Exception {
+        String sqlFilter = "\"PARTITIONS\".\"PART_NAME\" in  (" + 
makeParams(input.size()) + ")";
+        List<Long> partitionIds = 
directSql.getPartitionIdsViaSqlFilter(catName, dbName, tblName, sqlFilter,
+            input, Collections.emptyList(), -1);
+        if (!partitionIds.isEmpty()) {
+          String deleteSql = "delete from \"PART_COL_STATS\" where \"PART_ID\" 
in ( " + getIdListForIn(partitionIds) + ")";
+          List<Object> params = new ArrayList<>(colNames == null ? 1 : 
colNames.size() + 1);
+
+          if (colNames != null && !colNames.isEmpty()) {
+            deleteSql += " and \"COLUMN_NAME\" in (" + 
makeParams(colNames.size()) + ")";
+            params.addAll(colNames);
+          }
+
+          if (engine != null) {
+            deleteSql += " and \"ENGINE\" = ?";
+            params.add(engine);
+          }
+          try (QueryWrapper queryParams = new 
QueryWrapper(pm.newQuery("javax.jdo.query.SQL", deleteSql))) {
+            executeWithArray(queryParams.getInnerQuery(), params.toArray(), 
deleteSql);
+          }
+        }
+        return partitionIds;
+      }
+    });
+    try {
+      return updateColumnStatsAccurateForPartitions(partIds, colNames);
+    } catch (SQLException e) {
+      LOG.error("Fail to update the partitions' column accurate status", e);
+      throw new MetaException("Updating partitions's column accurate throws: " 
+ e.getMessage());
+    }
+  }
+
+  /**
+   a helper function which will get the current COLUMN_STATS_ACCURATE 
parameter on table level
+   and update the COLUMN_STATS_ACCURATE parameter with the new value on table 
level using directSql

Review Comment:
   This Javadoc block isn’t formatted as standard Javadoc (missing leading `*` 
on lines and proper sentence formatting), which can fail style checks and 
reduces readability. Please convert it to a properly formatted Javadoc comment 
or a regular block comment.
   ```suggestion
      * A helper function that gets the current {@code COLUMN_STATS_ACCURATE}
      * parameter at the table level and updates it with the new value using
      * directSql.
   ```



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java:
##########
@@ -10016,21 +10019,36 @@ public boolean deletePartitionColumnStatistics(String 
catName, String dbName, St
     }
     dbName = org.apache.commons.lang3.StringUtils.defaultString(dbName, 
Warehouse.DEFAULT_DATABASE_NAME);
     catName = normalizeIdentifier(catName);
+    final List<String> cols;
+    if (colNames != null && !colNames.isEmpty()) {
+      List<String> normalizedColNames = new ArrayList<>();
+      for (String colName : colNames) {
+        if (StringUtils.isEmpty(colName)) {
+          continue;
+        }
+        // trim the extra spaces, and change to lowercase
+        normalizedColNames.add(normalizeIdentifier(colName));
+      }
+      cols = normalizedColNames;
+    } else {
+      cols = new ArrayList<>();
+    }

Review Comment:
   When `colNames` is non-empty but all entries are blank, the normalization 
loop produces an empty `cols` list; downstream logic interprets an empty list 
as "delete stats for all columns". That makes `deletePartitionColumnStatistics` 
potentially drop all column stats unexpectedly for inputs like `[""]`. Consider 
rejecting this case with `InvalidInputException`, or preserving the distinction 
between "no valid column names" vs "no column filter".



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java:
##########
@@ -2169,15 +2169,23 @@ private static void 
updateTableColumnsStatsInternal(Configuration conf, ColumnSt
       throw new RuntimeException("CachedStore can only be enabled for Hive 
engine");
     }
     boolean succ = rawStore.deleteTableColumnStatistics(catName, dbName, 
tblName, colNames, engine);
+    catName = normalizeIdentifier(catName);
+    dbName = normalizeIdentifier(dbName);
+    tblName = normalizeIdentifier(tblName);
+    if (!shouldCacheTable(catName, dbName, tblName)) {
+      return succ;
+    }
+    Table cachedTable = sharedCache.getTableFromCache(catName, dbName, 
tblName);
+    if (cachedTable != null) {
+      if (colNames == null) {
+        StatsSetupConst.clearColumnStatsState(cachedTable.getParameters());
+      } else {
+        StatsSetupConst.removeColumnStatsState(cachedTable.getParameters(), 
colNames);
+      }

Review Comment:
   Cache-side update treats `colNames` as "delete all" only when it is `null`, 
but the underlying RawStore behavior treats both `null` and an empty list as 
"delete all". If `colNames` is empty, this code will call 
`removeColumnStatsState` (no-op) and leave `COLUMN_STATS_ACCURATE` stale in the 
cached table. Consider treating `colNames.isEmpty()` the same as `null` here.



##########
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java:
##########
@@ -1866,6 +1871,11 @@ public void testColumnStatistics() throws Throwable {
       // multiple columns
       request.setCol_names(Arrays.asList(colName));
       assertTrue(client.deleteColumnStatistics(request));
+      // test if the columns in colName array are deleted from parameter 
COLUMN_STATS_ACCURATE
+      tableParams = client.getTable(dbName, tblName).getParameters();
+      table_column_stats_accurate = 
tableParams.get(StatsSetupConst.COLUMN_STATS_ACCURATE);
+      assertTrue("parameter COLUMN_STATS_ACCURATE is not accurate in " + 
tblName, table_column_stats_accurate == null ||
+              (!table_column_stats_accurate.contains(colName[0]) && 
!table_column_stats_accurate.contains(colName[1])));

Review Comment:
   Same issue as above: validating `COLUMN_STATS_ACCURATE` by substring search 
is brittle. Consider using `StatsSetupConst.getColumnsHavingStats(...)` and 
asserting the list doesn’t contain the deleted column names.
   ```suggestion
         List<String> colsWithStats = 
StatsSetupConst.getColumnsHavingStats(tableParams);
         assertTrue("parameter COLUMN_STATS_ACCURATE is not accurate in " + 
tblName,
             colsWithStats == null ||
                 (!colsWithStats.contains(colName[0]) && 
!colsWithStats.contains(colName[1])));
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to