kasakrisz commented on code in PR #5498:
URL: https://github.com/apache/hive/pull/5498#discussion_r1808463369
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##########
@@ -471,6 +483,22 @@ public Map<String, String> getBasicStatistics(Partish
partish) {
Map<String, String> summary = table.currentSnapshot().summary();
if (summary != null) {
+ if
(Boolean.parseBoolean(summary.get(SnapshotSummary.PARTITION_SUMMARY_PROP)) &&
+ partish.getPartition() != null) {
+ String key = SnapshotSummary.CHANGED_PARTITION_PREFIX +
partish.getPartition().getName();
+ Map<String, String> map = Maps.newHashMap();
+
+ if (summary.containsKey(key)) {
+ StringTokenizer tokenizer = new
StringTokenizer(summary.get(key), ",");
+ while (tokenizer.hasMoreTokens()) {
+ String token = tokenizer.nextToken();
+ String[] keyValue = token.split("=");
+ map.put(keyValue[0], keyValue[1]);
+ }
+ }
+ summary = map;
Review Comment:
This can be extracted to a method which parses the summary into a `Map`.
This may help reducing the cyclomatic complexity of the `getBasicStatistics`
method
##########
ql/src/test/results/clientpositive/llap/vector_bucket.q.out:
##########
@@ -61,10 +61,11 @@ STAGE PLANS:
Execution mode: llap
LLAP IO: no inputs
Map Vectorization:
- enabled: false
+ enabled: true
enabledConditionsMet:
hive.vectorized.use.vectorized.input.format IS true
- enabledConditionsNotMet: Could not enable vectorization due to
partition column names size 1 is greater than the number of table column names
size 0 IS false
inputFileFormats:
org.apache.hadoop.hive.ql.io.NullRowsInputFormat
+ notVectorizedReason: UDTF Operator (UDTF) not supported
+ vectorized: false
Review Comment:
What is the cause of these changes? This is not an iceberg related test.
##########
ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnInfo.java:
##########
@@ -166,9 +170,13 @@ public String getTabAlias() {
}
public boolean getIsVirtualCol() {
- return isVirtualCol;
+ return isVirtualCol || isPartitionCol;
Review Comment:
Please don't mix virtual columns and partition columns. These are very
different things. `getIsVirtualCol()` should depend on `isVirtualCol` only.
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java:
##########
@@ -445,9 +444,9 @@ public static List<DeleteFile> getDeleteFiles(Table table,
String partitionPath)
t -> ((PositionDeletesScanTask) t).file()));
}
- public static Expression generateExpressionFromPartitionSpec(Table table,
Map<String, String> partitionSpec)
- throws SemanticException {
- Map<String, PartitionField> partitionFieldMap =
getPartitionFields(table).stream()
+ public static Expression generateExpressionFromPartitionSpec(Table table,
Map<String, String> partitionSpec,
+ boolean latestSpecOnly) throws SemanticException {
+ Map<String, PartitionField> partitionFieldMap = getPartitionFields(table,
latestSpecOnly).stream()
Review Comment:
The method `generateExpressionFromPartitionSpec` is called from two places
and the actual parameter value `latestSpecOnly` is always `true`. How about
calling
```
icebergTable.spec().fields()
```
instead of
```
getPartitionFields(table, latestSpecOnly)
```
##########
ql/src/java/org/apache/hadoop/hive/ql/metadata/DummyPartition.java:
##########
@@ -51,11 +48,15 @@ public DummyPartition(Table tbl, String name) {
this.name = name;
}
- public DummyPartition(Table tbl, String name,
- Map<String, String> partSpec) {
- setTable(tbl);
+ public DummyPartition(Table tbl, String name, Map<String, String> partSpec)
throws HiveException {
Review Comment:
Is the functionality of `DummyPartition` exploited? I saw that when we
create `DummyPartition` instances we return a `Partition` type reference and
never access any `DummyPartition` defined method. If this is not the case then
`DummyPartition` is fine. The name is misleading though since these objects
represent real partitions, aren't they?
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java:
##########
@@ -129,19 +129,19 @@ public static Table getTable(Configuration configuration,
org.apache.hadoop.hive
*/
static Table getTable(Configuration configuration, Properties properties,
boolean skipCache) {
String metaTable =
properties.getProperty(IcebergAcidUtil.META_TABLE_PROPERTY);
- String tableName = properties.getProperty(Catalogs.NAME);
- String location = properties.getProperty(Catalogs.LOCATION);
+ Properties props = (metaTable != null) ? new Properties() : properties;
+
if (metaTable != null) {
+ props.computeIfAbsent(InputFormatConfig.CATALOG_NAME, properties::get);
// HiveCatalog, HadoopCatalog uses NAME to identify the metadata table
- properties.setProperty(Catalogs.NAME, tableName + "." + metaTable);
+ props.computeIfAbsent(Catalogs.NAME, k -> properties.get(k) + "." +
metaTable);
// HadoopTable uses LOCATION to identify the metadata table
- properties.setProperty(Catalogs.LOCATION, location + "#" + metaTable);
+ props.computeIfAbsent(Catalogs.LOCATION, k -> properties.get(k) + "#" +
metaTable);
}
Review Comment:
Is this a necessary change? The original code had one if statement the new
one has 5.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]