kasakrisz commented on code in PR #5498: URL: https://github.com/apache/hive/pull/5498#discussion_r1808463369
########## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ########## @@ -471,6 +483,22 @@ public Map<String, String> getBasicStatistics(Partish partish) { Map<String, String> summary = table.currentSnapshot().summary(); if (summary != null) { + if (Boolean.parseBoolean(summary.get(SnapshotSummary.PARTITION_SUMMARY_PROP)) && + partish.getPartition() != null) { + String key = SnapshotSummary.CHANGED_PARTITION_PREFIX + partish.getPartition().getName(); + Map<String, String> map = Maps.newHashMap(); + + if (summary.containsKey(key)) { + StringTokenizer tokenizer = new StringTokenizer(summary.get(key), ","); + while (tokenizer.hasMoreTokens()) { + String token = tokenizer.nextToken(); + String[] keyValue = token.split("="); + map.put(keyValue[0], keyValue[1]); + } + } + summary = map; Review Comment: This can be extracted to a method which parses the summary into a `Map`. This may help reducing the cyclomatic complexity of the `getBasicStatistics` method ########## ql/src/test/results/clientpositive/llap/vector_bucket.q.out: ########## @@ -61,10 +61,11 @@ STAGE PLANS: Execution mode: llap LLAP IO: no inputs Map Vectorization: - enabled: false + enabled: true enabledConditionsMet: hive.vectorized.use.vectorized.input.format IS true - enabledConditionsNotMet: Could not enable vectorization due to partition column names size 1 is greater than the number of table column names size 0 IS false inputFileFormats: org.apache.hadoop.hive.ql.io.NullRowsInputFormat + notVectorizedReason: UDTF Operator (UDTF) not supported + vectorized: false Review Comment: What is the cause of these changes? This is not an iceberg related test. ########## ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnInfo.java: ########## @@ -166,9 +170,13 @@ public String getTabAlias() { } public boolean getIsVirtualCol() { - return isVirtualCol; + return isVirtualCol || isPartitionCol; Review Comment: Please don't mix virtual columns and partition columns. These are very different things. `getIsVirtualCol()` should depend on `isVirtualCol` only. ########## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java: ########## @@ -445,9 +444,9 @@ public static List<DeleteFile> getDeleteFiles(Table table, String partitionPath) t -> ((PositionDeletesScanTask) t).file())); } - public static Expression generateExpressionFromPartitionSpec(Table table, Map<String, String> partitionSpec) - throws SemanticException { - Map<String, PartitionField> partitionFieldMap = getPartitionFields(table).stream() + public static Expression generateExpressionFromPartitionSpec(Table table, Map<String, String> partitionSpec, + boolean latestSpecOnly) throws SemanticException { + Map<String, PartitionField> partitionFieldMap = getPartitionFields(table, latestSpecOnly).stream() Review Comment: The method `generateExpressionFromPartitionSpec` is called from two places and the actual parameter value `latestSpecOnly` is always `true`. How about calling ``` icebergTable.spec().fields() ``` instead of ``` getPartitionFields(table, latestSpecOnly) ``` ########## ql/src/java/org/apache/hadoop/hive/ql/metadata/DummyPartition.java: ########## @@ -51,11 +48,15 @@ public DummyPartition(Table tbl, String name) { this.name = name; } - public DummyPartition(Table tbl, String name, - Map<String, String> partSpec) { - setTable(tbl); + public DummyPartition(Table tbl, String name, Map<String, String> partSpec) throws HiveException { Review Comment: Is the functionality of `DummyPartition` exploited? I saw that when we create `DummyPartition` instances we return a `Partition` type reference and never access any `DummyPartition` defined method. If this is not the case then `DummyPartition` is fine. The name is misleading though since these objects represent real partitions, aren't they? ########## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java: ########## @@ -129,19 +129,19 @@ public static Table getTable(Configuration configuration, org.apache.hadoop.hive */ static Table getTable(Configuration configuration, Properties properties, boolean skipCache) { String metaTable = properties.getProperty(IcebergAcidUtil.META_TABLE_PROPERTY); - String tableName = properties.getProperty(Catalogs.NAME); - String location = properties.getProperty(Catalogs.LOCATION); + Properties props = (metaTable != null) ? new Properties() : properties; + if (metaTable != null) { + props.computeIfAbsent(InputFormatConfig.CATALOG_NAME, properties::get); // HiveCatalog, HadoopCatalog uses NAME to identify the metadata table - properties.setProperty(Catalogs.NAME, tableName + "." + metaTable); + props.computeIfAbsent(Catalogs.NAME, k -> properties.get(k) + "." + metaTable); // HadoopTable uses LOCATION to identify the metadata table - properties.setProperty(Catalogs.LOCATION, location + "#" + metaTable); + props.computeIfAbsent(Catalogs.LOCATION, k -> properties.get(k) + "#" + metaTable); } Review Comment: Is this a necessary change? The original code had one if statement the new one has 5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org