Re: [PR] HIVE-29413: Generalise column related APIs in Table.java [hive]

via GitHub Thu, 18 Jun 2026 08:04:31 -0700


Copilot commented on code in PR #6413:
URL: https://github.com/apache/hive/pull/6413#discussion_r3436838478



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -12008,47 +12013,44 @@ private Operator genTablePlan(String alias, QB qb) 
throws SemanticException {
       // Determine row schema for TSOP.
       // Include column names from SerDe, the partition and virtual columns.
       rwsch = new RowResolver();
-      try {
-        // Including parameters passed in the query
-        if (properties != null) {
-          for (Entry<String, String> prop : properties.entrySet()) {
-            if (tab.getSerdeParam(prop.getKey()) != null) {
-              LOG.warn("SerDe property in input query overrides stored SerDe 
property");
-            }
-            tab.setSerdeParam(prop.getKey(), prop.getValue());
-          }
-        }
-        // Obtain inspector for schema
-        final Deserializer deserializer = tab.getDeserializer();
-        StructObjectInspector rowObjectInspector = (StructObjectInspector) 
deserializer.getObjectInspector();
-
-        deserializer.handleJobLevelConfiguration(conf);
-        List<? extends StructField> fields = rowObjectInspector
-            .getAllStructFieldRefs();
-        Set<String> partCols = tab.hasNonNativePartitionSupport() ?
-            Sets.newHashSet(tab.getPartColNames()) : Collections.emptySet();
-        for (int i = 0; i < fields.size(); i++) {
-          /**
-           * if the column is a skewed column, use ColumnInfo accordingly
-           */
-          ColumnInfo colInfo = new ColumnInfo(fields.get(i).getFieldName(),
-              TypeInfoUtils.getTypeInfoFromObjectInspector(fields.get(i)
-                  .getFieldObjectInspector()), alias, false);
-          if (partCols.contains(colInfo.getInternalName())) {
-            colInfo.setHiddenPartitionCol(true);
+      // Including parameters passed in the query
+      if (properties != null) {
+        for (Entry<String, String> prop : properties.entrySet()) {
+          if (tab.getSerdeParam(prop.getKey()) != null) {
+            LOG.warn("SerDe property in input query overrides stored SerDe 
property");
           }
-          colInfo.setSkewedCol(isSkewedCol(alias, qb, 
fields.get(i).getFieldName()));
-          rwsch.put(alias, fields.get(i).getFieldName(), colInfo);
+          tab.setSerdeParam(prop.getKey(), prop.getValue());
         }
+      }
+      final Deserializer deserializer = tab.getDeserializer();
+      deserializer.handleJobLevelConfiguration(conf);
+      StructObjectInspector rowObjectInspector;
+      try {
+        rowObjectInspector = (StructObjectInspector) 
deserializer.getObjectInspector();
       } catch (SerDeException e) {
         throw new RuntimeException(e);
       }
-      // Hack!! - refactor once the metadata APIs with types are ready
-      // Finally add the partitioning columns
-      for (FieldSchema part_col : tab.getPartCols()) {
-        LOG.trace("Adding partition col: " + part_col);
-        rwsch.put(alias, part_col.getName(), new ColumnInfo(part_col.getName(),
-            TypeInfoFactory.getPrimitiveTypeInfo(part_col.getType()), alias, 
true));
+
+      int colCount = tab.getAllCols().size();
+      List<ColumnInfo> colInfoList = new 
ArrayList<>(Collections.nCopies(colCount, null));
+
+      // Build column info from the SerDe row schema (authoritative 
names/types), placed by table column index.
+      for (StructField field : rowObjectInspector.getAllStructFieldRefs()) {
+        String fieldName = field.getFieldName();
+        ColumnInfo colInfo = new ColumnInfo(fieldName,
+            
TypeInfoUtils.getTypeInfoFromObjectInspector(field.getFieldObjectInspector()), 
alias, false);
+        colInfo.setSkewedCol(isSkewedCol(alias, qb, fieldName));
+        colInfoList.set(tab.getColumnIndexByName(fieldName), colInfo);
+      }
+      for (FieldSchema partCol : tab.getPartCols()) {
+        LOG.trace("Adding partition col: {} ", partCol);
+        ColumnInfo colInfo = new ColumnInfo(partCol.getName(),
+            TypeInfoFactory.getPrimitiveTypeInfo(partCol.getType()), alias, 
true);
+        colInfoList.set(tab.getColumnIndexByName(partCol.getName()), colInfo);
+      }

Review Comment:
   `tab.getColumnIndexByName(partCol.getName())` is also assumed non-null. If 
`getPartCols()` returns a column name that isn't present in the table's column 
index (e.g. partition evolution / handler mismatch), this will NPE. It would be 
safer to validate and raise a `SemanticException` with the table/column name.



##########
ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java:
##########
@@ -728,8 +750,50 @@ private boolean isField(String col) {
     return false;
   }
 
+  private void  ensureColumnsIndexed() {
+    if (columnsByName != null) {
+      return;
+    }
+    Map<String, TableColumn> indexedColumns = new HashMap<>();
+    List<FieldSchema> fsList = new ArrayList<>(getColsInternal(false));
+    if (!hasNonNativePartitionSupport() || isView()) {
+      fsList.addAll(getPartitionKeys());
+    }
+    for (int i = 0; i < fsList.size(); i++) {
+      indexedColumns.put(fsList.get(i).getName().toLowerCase(), new 
TableColumn(i, fsList.get(i)));
+    }
+    columnsByName = indexedColumns;
+  }
+
+  public Integer getColumnIndexByName(String colName) {
+    ensureColumnsIndexed();
+    TableColumn column = columnsByName.get(colName.toLowerCase());
+    return column != null ? column.index() : null;
+  }

Review Comment:
   `getColumnIndexByName` returns `null` for unknown columns, but all current 
call sites in this PR auto-unbox it to `int` (e.g. 
`List#set(tab.getColumnIndexByName(...), ...)`), which will throw a 
`NullPointerException` with little context. Since this is a new API, consider 
failing fast here with an explicit exception message instead of returning 
`null`.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -12008,47 +12013,44 @@ private Operator genTablePlan(String alias, QB qb) 
throws SemanticException {
       // Determine row schema for TSOP.
       // Include column names from SerDe, the partition and virtual columns.
       rwsch = new RowResolver();
-      try {
-        // Including parameters passed in the query
-        if (properties != null) {
-          for (Entry<String, String> prop : properties.entrySet()) {
-            if (tab.getSerdeParam(prop.getKey()) != null) {
-              LOG.warn("SerDe property in input query overrides stored SerDe 
property");
-            }
-            tab.setSerdeParam(prop.getKey(), prop.getValue());
-          }
-        }
-        // Obtain inspector for schema
-        final Deserializer deserializer = tab.getDeserializer();
-        StructObjectInspector rowObjectInspector = (StructObjectInspector) 
deserializer.getObjectInspector();
-
-        deserializer.handleJobLevelConfiguration(conf);
-        List<? extends StructField> fields = rowObjectInspector
-            .getAllStructFieldRefs();
-        Set<String> partCols = tab.hasNonNativePartitionSupport() ?
-            Sets.newHashSet(tab.getPartColNames()) : Collections.emptySet();
-        for (int i = 0; i < fields.size(); i++) {
-          /**
-           * if the column is a skewed column, use ColumnInfo accordingly
-           */
-          ColumnInfo colInfo = new ColumnInfo(fields.get(i).getFieldName(),
-              TypeInfoUtils.getTypeInfoFromObjectInspector(fields.get(i)
-                  .getFieldObjectInspector()), alias, false);
-          if (partCols.contains(colInfo.getInternalName())) {
-            colInfo.setHiddenPartitionCol(true);
+      // Including parameters passed in the query
+      if (properties != null) {
+        for (Entry<String, String> prop : properties.entrySet()) {
+          if (tab.getSerdeParam(prop.getKey()) != null) {
+            LOG.warn("SerDe property in input query overrides stored SerDe 
property");
           }
-          colInfo.setSkewedCol(isSkewedCol(alias, qb, 
fields.get(i).getFieldName()));
-          rwsch.put(alias, fields.get(i).getFieldName(), colInfo);
+          tab.setSerdeParam(prop.getKey(), prop.getValue());
         }
+      }
+      final Deserializer deserializer = tab.getDeserializer();
+      deserializer.handleJobLevelConfiguration(conf);
+      StructObjectInspector rowObjectInspector;
+      try {
+        rowObjectInspector = (StructObjectInspector) 
deserializer.getObjectInspector();
       } catch (SerDeException e) {
         throw new RuntimeException(e);
       }
-      // Hack!! - refactor once the metadata APIs with types are ready
-      // Finally add the partitioning columns
-      for (FieldSchema part_col : tab.getPartCols()) {
-        LOG.trace("Adding partition col: " + part_col);
-        rwsch.put(alias, part_col.getName(), new ColumnInfo(part_col.getName(),
-            TypeInfoFactory.getPrimitiveTypeInfo(part_col.getType()), alias, 
true));
+
+      int colCount = tab.getAllCols().size();
+      List<ColumnInfo> colInfoList = new 
ArrayList<>(Collections.nCopies(colCount, null));
+
+      // Build column info from the SerDe row schema (authoritative 
names/types), placed by table column index.
+      for (StructField field : rowObjectInspector.getAllStructFieldRefs()) {
+        String fieldName = field.getFieldName();
+        ColumnInfo colInfo = new ColumnInfo(fieldName,
+            
TypeInfoUtils.getTypeInfoFromObjectInspector(field.getFieldObjectInspector()), 
alias, false);
+        colInfo.setSkewedCol(isSkewedCol(alias, qb, fieldName));
+        colInfoList.set(tab.getColumnIndexByName(fieldName), colInfo);
+      }

Review Comment:
   `tab.getColumnIndexByName(fieldName)` can return `null` (e.g. if the SerDe 
exposes a field name not present in the table schema). The current code will 
NPE via auto-unboxing in `List#set`, turning a schema mismatch into a 
hard-to-debug runtime exception. Consider validating the index and throwing a 
`SemanticException` with context.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -12008,47 +12013,44 @@ private Operator genTablePlan(String alias, QB qb) 
throws SemanticException {
       // Determine row schema for TSOP.
       // Include column names from SerDe, the partition and virtual columns.
       rwsch = new RowResolver();
-      try {
-        // Including parameters passed in the query
-        if (properties != null) {
-          for (Entry<String, String> prop : properties.entrySet()) {
-            if (tab.getSerdeParam(prop.getKey()) != null) {
-              LOG.warn("SerDe property in input query overrides stored SerDe 
property");
-            }
-            tab.setSerdeParam(prop.getKey(), prop.getValue());
-          }
-        }
-        // Obtain inspector for schema
-        final Deserializer deserializer = tab.getDeserializer();
-        StructObjectInspector rowObjectInspector = (StructObjectInspector) 
deserializer.getObjectInspector();
-
-        deserializer.handleJobLevelConfiguration(conf);
-        List<? extends StructField> fields = rowObjectInspector
-            .getAllStructFieldRefs();
-        Set<String> partCols = tab.hasNonNativePartitionSupport() ?
-            Sets.newHashSet(tab.getPartColNames()) : Collections.emptySet();
-        for (int i = 0; i < fields.size(); i++) {
-          /**
-           * if the column is a skewed column, use ColumnInfo accordingly
-           */
-          ColumnInfo colInfo = new ColumnInfo(fields.get(i).getFieldName(),
-              TypeInfoUtils.getTypeInfoFromObjectInspector(fields.get(i)
-                  .getFieldObjectInspector()), alias, false);
-          if (partCols.contains(colInfo.getInternalName())) {
-            colInfo.setHiddenPartitionCol(true);
+      // Including parameters passed in the query
+      if (properties != null) {
+        for (Entry<String, String> prop : properties.entrySet()) {
+          if (tab.getSerdeParam(prop.getKey()) != null) {
+            LOG.warn("SerDe property in input query overrides stored SerDe 
property");
           }
-          colInfo.setSkewedCol(isSkewedCol(alias, qb, 
fields.get(i).getFieldName()));
-          rwsch.put(alias, fields.get(i).getFieldName(), colInfo);
+          tab.setSerdeParam(prop.getKey(), prop.getValue());
         }
+      }
+      final Deserializer deserializer = tab.getDeserializer();
+      deserializer.handleJobLevelConfiguration(conf);
+      StructObjectInspector rowObjectInspector;
+      try {
+        rowObjectInspector = (StructObjectInspector) 
deserializer.getObjectInspector();
       } catch (SerDeException e) {
         throw new RuntimeException(e);
       }
-      // Hack!! - refactor once the metadata APIs with types are ready
-      // Finally add the partitioning columns
-      for (FieldSchema part_col : tab.getPartCols()) {
-        LOG.trace("Adding partition col: " + part_col);
-        rwsch.put(alias, part_col.getName(), new ColumnInfo(part_col.getName(),
-            TypeInfoFactory.getPrimitiveTypeInfo(part_col.getType()), alias, 
true));
+
+      int colCount = tab.getAllCols().size();
+      List<ColumnInfo> colInfoList = new 
ArrayList<>(Collections.nCopies(colCount, null));
+
+      // Build column info from the SerDe row schema (authoritative 
names/types), placed by table column index.
+      for (StructField field : rowObjectInspector.getAllStructFieldRefs()) {
+        String fieldName = field.getFieldName();
+        ColumnInfo colInfo = new ColumnInfo(fieldName,
+            
TypeInfoUtils.getTypeInfoFromObjectInspector(field.getFieldObjectInspector()), 
alias, false);
+        colInfo.setSkewedCol(isSkewedCol(alias, qb, fieldName));
+        colInfoList.set(tab.getColumnIndexByName(fieldName), colInfo);
+      }
+      for (FieldSchema partCol : tab.getPartCols()) {
+        LOG.trace("Adding partition col: {} ", partCol);
+        ColumnInfo colInfo = new ColumnInfo(partCol.getName(),
+            TypeInfoFactory.getPrimitiveTypeInfo(partCol.getType()), alias, 
true);
+        colInfoList.set(tab.getColumnIndexByName(partCol.getName()), colInfo);
+      }
+
+      for (ColumnInfo colInfo : colInfoList) {
+        rwsch.put(alias, colInfo.getInternalName(), colInfo);
       }

Review Comment:
   `colInfoList` is initialized with nulls and populated by name->index 
lookups. If any column isn't populated (e.g. SerDe schema mismatch), `colInfo` 
will be null here and `colInfo.getInternalName()` will NPE. Consider explicitly 
checking for null and failing fast with a clear `SemanticException` so the 
cause is obvious.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -12008,47 +12013,44 @@ private Operator genTablePlan(String alias, QB qb) 
throws SemanticException {
       // Determine row schema for TSOP.
       // Include column names from SerDe, the partition and virtual columns.
       rwsch = new RowResolver();
-      try {
-        // Including parameters passed in the query
-        if (properties != null) {
-          for (Entry<String, String> prop : properties.entrySet()) {
-            if (tab.getSerdeParam(prop.getKey()) != null) {
-              LOG.warn("SerDe property in input query overrides stored SerDe 
property");
-            }
-            tab.setSerdeParam(prop.getKey(), prop.getValue());
-          }
-        }
-        // Obtain inspector for schema
-        final Deserializer deserializer = tab.getDeserializer();
-        StructObjectInspector rowObjectInspector = (StructObjectInspector) 
deserializer.getObjectInspector();
-
-        deserializer.handleJobLevelConfiguration(conf);
-        List<? extends StructField> fields = rowObjectInspector
-            .getAllStructFieldRefs();
-        Set<String> partCols = tab.hasNonNativePartitionSupport() ?
-            Sets.newHashSet(tab.getPartColNames()) : Collections.emptySet();
-        for (int i = 0; i < fields.size(); i++) {
-          /**
-           * if the column is a skewed column, use ColumnInfo accordingly
-           */
-          ColumnInfo colInfo = new ColumnInfo(fields.get(i).getFieldName(),
-              TypeInfoUtils.getTypeInfoFromObjectInspector(fields.get(i)
-                  .getFieldObjectInspector()), alias, false);
-          if (partCols.contains(colInfo.getInternalName())) {
-            colInfo.setHiddenPartitionCol(true);
+      // Including parameters passed in the query
+      if (properties != null) {
+        for (Entry<String, String> prop : properties.entrySet()) {
+          if (tab.getSerdeParam(prop.getKey()) != null) {
+            LOG.warn("SerDe property in input query overrides stored SerDe 
property");
           }
-          colInfo.setSkewedCol(isSkewedCol(alias, qb, 
fields.get(i).getFieldName()));
-          rwsch.put(alias, fields.get(i).getFieldName(), colInfo);
+          tab.setSerdeParam(prop.getKey(), prop.getValue());
         }
+      }
+      final Deserializer deserializer = tab.getDeserializer();
+      deserializer.handleJobLevelConfiguration(conf);
+      StructObjectInspector rowObjectInspector;
+      try {
+        rowObjectInspector = (StructObjectInspector) 
deserializer.getObjectInspector();
       } catch (SerDeException e) {
         throw new RuntimeException(e);
       }
-      // Hack!! - refactor once the metadata APIs with types are ready
-      // Finally add the partitioning columns
-      for (FieldSchema part_col : tab.getPartCols()) {
-        LOG.trace("Adding partition col: " + part_col);
-        rwsch.put(alias, part_col.getName(), new ColumnInfo(part_col.getName(),
-            TypeInfoFactory.getPrimitiveTypeInfo(part_col.getType()), alias, 
true));
+
+      int colCount = tab.getAllCols().size();
+      List<ColumnInfo> colInfoList = new 
ArrayList<>(Collections.nCopies(colCount, null));
+
+      // Build column info from the SerDe row schema (authoritative 
names/types), placed by table column index.
+      for (StructField field : rowObjectInspector.getAllStructFieldRefs()) {
+        String fieldName = field.getFieldName();
+        ColumnInfo colInfo = new ColumnInfo(fieldName,
+            
TypeInfoUtils.getTypeInfoFromObjectInspector(field.getFieldObjectInspector()), 
alias, false);
+        colInfo.setSkewedCol(isSkewedCol(alias, qb, fieldName));
+        colInfoList.set(tab.getColumnIndexByName(fieldName), colInfo);
+      }
+      for (FieldSchema partCol : tab.getPartCols()) {
+        LOG.trace("Adding partition col: {} ", partCol);
+        ColumnInfo colInfo = new ColumnInfo(partCol.getName(),
+            TypeInfoFactory.getPrimitiveTypeInfo(partCol.getType()), alias, 
true);
+        colInfoList.set(tab.getColumnIndexByName(partCol.getName()), colInfo);
+      }
+
+      for (ColumnInfo colInfo : colInfoList) {
+        rwsch.put(alias, colInfo.getInternalName(), colInfo);
       }

Review Comment:
   `colInfoList` is initialized with nulls and populated by name/index lookups; 
if any column isn't resolved (e.g., missing from SerDe schema), this loop will 
NPE on `colInfo.getInternalName()`. It would be better to explicitly detect 
unresolved entries and throw a `SemanticException` that explains which 
column(s) could not be mapped.



##########
ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java:
##########
@@ -728,8 +750,50 @@ private boolean isField(String col) {
     return false;
   }
 
+  private void  ensureColumnsIndexed() {
+    if (columnsByName != null) {
+      return;
+    }
+    Map<String, TableColumn> indexedColumns = new HashMap<>();
+    List<FieldSchema> fsList = new ArrayList<>(getColsInternal(false));
+    if (!hasNonNativePartitionSupport() || isView()) {
+      fsList.addAll(getPartitionKeys());
+    }
+    for (int i = 0; i < fsList.size(); i++) {
+      indexedColumns.put(fsList.get(i).getName().toLowerCase(), new 
TableColumn(i, fsList.get(i)));
+    }
+    columnsByName = indexedColumns;
+  }
+
+  public Integer getColumnIndexByName(String colName) {
+    ensureColumnsIndexed();
+    TableColumn column = columnsByName.get(colName.toLowerCase());
+    return column != null ? column.index() : null;
+  }

Review Comment:
   `getColumnIndexByName` returns `null` for unknown columns. Multiple call 
sites in this PR immediately auto-unbox the result to `int` or use it for list 
indexing, which will NPE without context if a column name isn't found (e.g., 
case differences, SerDe schema mismatch, or storage handler returning a 
partition key not present in the table schema). Consider changing this API to 
fail fast with a descriptive exception (or return a sentinel like `-1`) so 
callers can handle errors explicitly.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/MergeSemanticAnalyzer.java:
##########
@@ -230,7 +229,7 @@ private MergeStatement.UpdateClause handleUpdate(ASTNode 
whenMatchedUpdateClause
                                                    String 
deleteExtraPredicate) throws SemanticException {
     assert whenMatchedUpdateClause.getType() == HiveParser.TOK_MATCHED;
     assert getWhenClauseOperation(whenMatchedUpdateClause).getType() == 
HiveParser.TOK_UPDATE;
-    Map<String, String> newValuesMap = new 
HashMap<>(targetTable.getCols().size() + targetTable.getPartCols().size());
+    Map<String, String> newValuesMap = 
HashMap.newHashMap(targetTable.getAllCols().size());

Review Comment:
   `java.util.HashMap` does not have a `newHashMap(int)` factory method; this 
will not compile. Use the `HashMap` constructor with an initial capacity 
instead.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -12008,47 +12013,44 @@ private Operator genTablePlan(String alias, QB qb) 
throws SemanticException {
       // Determine row schema for TSOP.
       // Include column names from SerDe, the partition and virtual columns.
       rwsch = new RowResolver();
-      try {
-        // Including parameters passed in the query
-        if (properties != null) {
-          for (Entry<String, String> prop : properties.entrySet()) {
-            if (tab.getSerdeParam(prop.getKey()) != null) {
-              LOG.warn("SerDe property in input query overrides stored SerDe 
property");
-            }
-            tab.setSerdeParam(prop.getKey(), prop.getValue());
-          }
-        }
-        // Obtain inspector for schema
-        final Deserializer deserializer = tab.getDeserializer();
-        StructObjectInspector rowObjectInspector = (StructObjectInspector) 
deserializer.getObjectInspector();
-
-        deserializer.handleJobLevelConfiguration(conf);
-        List<? extends StructField> fields = rowObjectInspector
-            .getAllStructFieldRefs();
-        Set<String> partCols = tab.hasNonNativePartitionSupport() ?
-            Sets.newHashSet(tab.getPartColNames()) : Collections.emptySet();
-        for (int i = 0; i < fields.size(); i++) {
-          /**
-           * if the column is a skewed column, use ColumnInfo accordingly
-           */
-          ColumnInfo colInfo = new ColumnInfo(fields.get(i).getFieldName(),
-              TypeInfoUtils.getTypeInfoFromObjectInspector(fields.get(i)
-                  .getFieldObjectInspector()), alias, false);
-          if (partCols.contains(colInfo.getInternalName())) {
-            colInfo.setHiddenPartitionCol(true);
+      // Including parameters passed in the query
+      if (properties != null) {
+        for (Entry<String, String> prop : properties.entrySet()) {
+          if (tab.getSerdeParam(prop.getKey()) != null) {
+            LOG.warn("SerDe property in input query overrides stored SerDe 
property");
           }
-          colInfo.setSkewedCol(isSkewedCol(alias, qb, 
fields.get(i).getFieldName()));
-          rwsch.put(alias, fields.get(i).getFieldName(), colInfo);
+          tab.setSerdeParam(prop.getKey(), prop.getValue());
         }
+      }
+      final Deserializer deserializer = tab.getDeserializer();
+      deserializer.handleJobLevelConfiguration(conf);
+      StructObjectInspector rowObjectInspector;
+      try {
+        rowObjectInspector = (StructObjectInspector) 
deserializer.getObjectInspector();
       } catch (SerDeException e) {
         throw new RuntimeException(e);
       }
-      // Hack!! - refactor once the metadata APIs with types are ready
-      // Finally add the partitioning columns
-      for (FieldSchema part_col : tab.getPartCols()) {
-        LOG.trace("Adding partition col: " + part_col);
-        rwsch.put(alias, part_col.getName(), new ColumnInfo(part_col.getName(),
-            TypeInfoFactory.getPrimitiveTypeInfo(part_col.getType()), alias, 
true));
+
+      int colCount = tab.getAllCols().size();
+      List<ColumnInfo> colInfoList = new 
ArrayList<>(Collections.nCopies(colCount, null));
+
+      // Build column info from the SerDe row schema (authoritative 
names/types), placed by table column index.
+      for (StructField field : rowObjectInspector.getAllStructFieldRefs()) {
+        String fieldName = field.getFieldName();
+        ColumnInfo colInfo = new ColumnInfo(fieldName,
+            
TypeInfoUtils.getTypeInfoFromObjectInspector(field.getFieldObjectInspector()), 
alias, false);
+        colInfo.setSkewedCol(isSkewedCol(alias, qb, fieldName));
+        colInfoList.set(tab.getColumnIndexByName(fieldName), colInfo);
+      }
+      for (FieldSchema partCol : tab.getPartCols()) {
+        LOG.trace("Adding partition col: {} ", partCol);
+        ColumnInfo colInfo = new ColumnInfo(partCol.getName(),
+            TypeInfoFactory.getPrimitiveTypeInfo(partCol.getType()), alias, 
true);
+        colInfoList.set(tab.getColumnIndexByName(partCol.getName()), colInfo);
+      }

Review Comment:
   `tab.getColumnIndexByName(partCol.getName())` can return `null`, which will 
NPE when used as the index into `colInfoList`. This should be validated and 
reported as a `SemanticException` with context, rather than failing with an NPE.



##########
ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java:
##########
@@ -1009,12 +1083,12 @@ public boolean isMaterializedView() {
    *          Use the information from this partition.
    * @return Partition name to value mapping.
    */
-  public LinkedHashMap<String, String> createSpec(
+  public Map<String, String> createSpec(
       org.apache.hadoop.hive.metastore.api.Partition tp) {
 
     List<FieldSchema> fsl = getPartCols();
     List<String> tpl = tp.getValues();
-    LinkedHashMap<String, String> spec = new LinkedHashMap<String, 
String>(fsl.size());
+    Map<String, String> spec = LinkedHashMap.newLinkedHashMap(fsl.size());

Review Comment:
   `java.util.LinkedHashMap` doesn't provide a `newLinkedHashMap(int)` factory 
method, so this will not compile. Use the constructor instead (or 
`Maps.newLinkedHashMapWithExpectedSize` if you prefer a helper).



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/MergeSemanticAnalyzer.java:
##########
@@ -230,7 +229,7 @@ private MergeStatement.UpdateClause handleUpdate(ASTNode 
whenMatchedUpdateClause
                                                    String 
deleteExtraPredicate) throws SemanticException {
     assert whenMatchedUpdateClause.getType() == HiveParser.TOK_MATCHED;
     assert getWhenClauseOperation(whenMatchedUpdateClause).getType() == 
HiveParser.TOK_UPDATE;
-    Map<String, String> newValuesMap = new 
HashMap<>(targetTable.getCols().size() + targetTable.getPartCols().size());
+    Map<String, String> newValuesMap = 
HashMap.newHashMap(targetTable.getAllCols().size());

Review Comment:
   `java.util.HashMap` has no `newHashMap(int)` factory method, so this will 
not compile. Use `new HashMap<>(...)` (or 
`Maps.newHashMapWithExpectedSize(...)` if you want the Guava helper).



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -12008,47 +12013,44 @@ private Operator genTablePlan(String alias, QB qb) 
throws SemanticException {
       // Determine row schema for TSOP.
       // Include column names from SerDe, the partition and virtual columns.
       rwsch = new RowResolver();
-      try {
-        // Including parameters passed in the query
-        if (properties != null) {
-          for (Entry<String, String> prop : properties.entrySet()) {
-            if (tab.getSerdeParam(prop.getKey()) != null) {
-              LOG.warn("SerDe property in input query overrides stored SerDe 
property");
-            }
-            tab.setSerdeParam(prop.getKey(), prop.getValue());
-          }
-        }
-        // Obtain inspector for schema
-        final Deserializer deserializer = tab.getDeserializer();
-        StructObjectInspector rowObjectInspector = (StructObjectInspector) 
deserializer.getObjectInspector();
-
-        deserializer.handleJobLevelConfiguration(conf);
-        List<? extends StructField> fields = rowObjectInspector
-            .getAllStructFieldRefs();
-        Set<String> partCols = tab.hasNonNativePartitionSupport() ?
-            Sets.newHashSet(tab.getPartColNames()) : Collections.emptySet();
-        for (int i = 0; i < fields.size(); i++) {
-          /**
-           * if the column is a skewed column, use ColumnInfo accordingly
-           */
-          ColumnInfo colInfo = new ColumnInfo(fields.get(i).getFieldName(),
-              TypeInfoUtils.getTypeInfoFromObjectInspector(fields.get(i)
-                  .getFieldObjectInspector()), alias, false);
-          if (partCols.contains(colInfo.getInternalName())) {
-            colInfo.setHiddenPartitionCol(true);
+      // Including parameters passed in the query
+      if (properties != null) {
+        for (Entry<String, String> prop : properties.entrySet()) {
+          if (tab.getSerdeParam(prop.getKey()) != null) {
+            LOG.warn("SerDe property in input query overrides stored SerDe 
property");
           }
-          colInfo.setSkewedCol(isSkewedCol(alias, qb, 
fields.get(i).getFieldName()));
-          rwsch.put(alias, fields.get(i).getFieldName(), colInfo);
+          tab.setSerdeParam(prop.getKey(), prop.getValue());
         }
+      }
+      final Deserializer deserializer = tab.getDeserializer();
+      deserializer.handleJobLevelConfiguration(conf);
+      StructObjectInspector rowObjectInspector;
+      try {
+        rowObjectInspector = (StructObjectInspector) 
deserializer.getObjectInspector();
       } catch (SerDeException e) {
         throw new RuntimeException(e);
       }
-      // Hack!! - refactor once the metadata APIs with types are ready
-      // Finally add the partitioning columns
-      for (FieldSchema part_col : tab.getPartCols()) {
-        LOG.trace("Adding partition col: " + part_col);
-        rwsch.put(alias, part_col.getName(), new ColumnInfo(part_col.getName(),
-            TypeInfoFactory.getPrimitiveTypeInfo(part_col.getType()), alias, 
true));
+
+      int colCount = tab.getAllCols().size();
+      List<ColumnInfo> colInfoList = new 
ArrayList<>(Collections.nCopies(colCount, null));
+
+      // Build column info from the SerDe row schema (authoritative 
names/types), placed by table column index.
+      for (StructField field : rowObjectInspector.getAllStructFieldRefs()) {
+        String fieldName = field.getFieldName();
+        ColumnInfo colInfo = new ColumnInfo(fieldName,
+            
TypeInfoUtils.getTypeInfoFromObjectInspector(field.getFieldObjectInspector()), 
alias, false);
+        colInfo.setSkewedCol(isSkewedCol(alias, qb, fieldName));
+        colInfoList.set(tab.getColumnIndexByName(fieldName), colInfo);
+      }

Review Comment:
   `tab.getColumnIndexByName(fieldName)` can return `null` (unknown column), 
which will NPE when auto-unboxed by `List#set(int, ...)`. This currently turns 
SerDe/table schema mismatches into an unhelpful NPE during compilation; it 
should fail fast with a `SemanticException` that includes the table/column name.



##########
storage-api/src/java/org/apache/hadoop/hive/common/io/CacheTag.java:
##########
@@ -90,11 +90,11 @@ public static final CacheTag build(String catalogName, 
String dbAndTableName) {
     return build(catalogName + "." + dbAndTableName);
   }
 
-  public static final CacheTag build(String catalogName, String 
dbAndTableName, LinkedHashMap<String, String> partDescMap) {
+  public static final CacheTag build(String catalogName, String 
dbAndTableName, Map<String, String> partDescMap) {
     return build(catalogName + "." + dbAndTableName, partDescMap);
   }
 
-  public static final CacheTag build(String fullTableName, 
LinkedHashMap<String, String> partDescMap) {
+  public static final CacheTag build(String fullTableName, Map<String, String> 
partDescMap) {
     if (StringUtils.isEmpty(fullTableName) || partDescMap == null || 
partDescMap.isEmpty()) {
       throw new IllegalArgumentException();
     }

Review Comment:
   Changing the parameter type from `LinkedHashMap` to `Map` makes iteration 
order undefined, but this method encodes the partition spec by iterating 
`partDescMap.entrySet()`. That can make cache tags non-deterministic (and break 
cache key stability) when callers pass an unordered map implementation. 
Consider canonicalizing the entry order (e.g., sort by key) before encoding.



##########
ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java:
##########
@@ -1009,12 +1083,12 @@ public boolean isMaterializedView() {
    *          Use the information from this partition.
    * @return Partition name to value mapping.
    */
-  public LinkedHashMap<String, String> createSpec(
+  public Map<String, String> createSpec(
       org.apache.hadoop.hive.metastore.api.Partition tp) {
 
     List<FieldSchema> fsl = getPartCols();
     List<String> tpl = tp.getValues();
-    LinkedHashMap<String, String> spec = new LinkedHashMap<String, 
String>(fsl.size());
+    Map<String, String> spec = LinkedHashMap.newLinkedHashMap(fsl.size());

Review Comment:
   `java.util.LinkedHashMap` does not have a `newLinkedHashMap(int)` factory 
method; this will not compile. Use `new LinkedHashMap<>(initialCapacity)` (or 
Guava `Maps.newLinkedHashMapWithExpectedSize`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-29413: Generalise column related APIs in Table.java [hive]

Reply via email to