[GitHub] [iceberg] dramaticlly commented on a diff in pull request #8106: Core: push down filters when evaluating entries in metadata tables

via GitHub Tue, 25 Jul 2023 15:37:28 -0700


dramaticlly commented on code in PR #8106:
URL: https://github.com/apache/iceberg/pull/8106#discussion_r1274184133



##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -59,6 +63,16 @@ public Schema schema() {
     return TypeUtil.join(schema, 
MetricsUtil.readableMetricsSchema(table().schema(), schema));
   }
 
+  @Override
+  public PartitionSpec spec() {
+    return specs.get(defaultSpecId);

Review Comment:
   > Should the partition spec of metadata be equal to the table
   
   I think there are 
[methods](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/BaseMetadataTable.java#L69-L97)
 by @szehon-ho which did identity transform of each columns found in given 
partition spec. It was meant for predicate pushdown in metadata column in #2926 
   
   > the metadata tables should be definitive new tables. Just like we have 
specified schema for the metadata table
   I am not sure if I understand you, to populate data in both data tables and 
metadata tables, the similar (not exact set of ) underlying files needs to be 
scanned and reporting different representation of same date. Let's say for 
PartitionTable, we count number of files and regard together with other columns 
aggregated on partition level. So from what I can tell, the metadata 
partitionSpec are only useful to provide predicate pushdown so it can leverage 
the existing tools when we used to read its original data tables.
   
   Data and metadata tables do have different schemas but identity 
transformation of original data partition spec to be used in metadata table 
seem to be a good option to me. Let me know what do you think?



##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -59,6 +63,16 @@ public Schema schema() {
     return TypeUtil.join(schema, 
MetricsUtil.readableMetricsSchema(table().schema(), schema));
   }
 
+  @Override
+  public PartitionSpec spec() {
+    return specs.get(defaultSpecId);

Review Comment:
   > Should the partition spec of metadata be equal to the table
   
   I think there are 
[methods](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/BaseMetadataTable.java#L69-L97)
 by @szehon-ho which did identity transform of each columns found in given 
partition spec. It was meant for predicate pushdown in metadata column in #2926 
   
   > the metadata tables should be definitive new tables. Just like we have 
specified schema for the metadata table
   
   I am not sure if I understand you, to populate data in both data tables and 
metadata tables, the similar (not exact set of ) underlying files needs to be 
scanned and reporting different representation of same date. Let's say for 
PartitionTable, we count number of files and regard together with other columns 
aggregated on partition level. So from what I can tell, the metadata 
partitionSpec are only useful to provide predicate pushdown so it can leverage 
the existing tools when we used to read its original data tables.
   
   Data and metadata tables do have different schemas but identity 
transformation of original data partition spec to be used in metadata table 
seem to be a good option to me. Let me know what do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] dramaticlly commented on a diff in pull request #8106: Core: push down filters when evaluating entries in metadata tables

Reply via email to