dramaticlly commented on code in PR #8106:
URL: https://github.com/apache/iceberg/pull/8106#discussion_r1274184133
##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -59,6 +63,16 @@ public Schema schema() {
return TypeUtil.join(schema,
MetricsUtil.readableMetricsSchema(table().schema(), schema));
}
+ @Override
+ public PartitionSpec spec() {
+ return specs.get(defaultSpecId);
Review Comment:
> Should the partition spec of metadata be equal to the table
I think there are
[methods](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/BaseMetadataTable.java#L69-L97)
by @szehon-ho which did identity transform of each columns found in given
partition spec. It was meant for predicate pushdown in metadata column in #2926
> the metadata tables should be definitive new tables. Just like we have
specified schema for the metadata table
I am not sure if I understand you, to populate data in both data tables and
metadata tables, the similar (not exact set of ) underlying files needs to be
scanned and reporting different representation of same date. Let's say for
PartitionTable, we count number of files and regard together with other columns
aggregated on partition level. So from what I can tell, the metadata
partitionSpec are only useful to provide predicate pushdown so it can leverage
the existing tools when we used to read its original data tables.
Data and metadata tables do have different schemas but identity
transformation of original data partition spec to be used in metadata table
seem to be a good option to me. Let me know what do you think?
##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -59,6 +63,16 @@ public Schema schema() {
return TypeUtil.join(schema,
MetricsUtil.readableMetricsSchema(table().schema(), schema));
}
+ @Override
+ public PartitionSpec spec() {
+ return specs.get(defaultSpecId);
Review Comment:
> Should the partition spec of metadata be equal to the table
I think there are
[methods](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/BaseMetadataTable.java#L69-L97)
by @szehon-ho which did identity transform of each columns found in given
partition spec. It was meant for predicate pushdown in metadata column in #2926
> the metadata tables should be definitive new tables. Just like we have
specified schema for the metadata table
I am not sure if I understand you, to populate data in both data tables and
metadata tables, the similar (not exact set of ) underlying files needs to be
scanned and reporting different representation of same date. Let's say for
PartitionTable, we count number of files and regard together with other columns
aggregated on partition level. So from what I can tell, the metadata
partitionSpec are only useful to provide predicate pushdown so it can leverage
the existing tools when we used to read its original data tables.
Data and metadata tables do have different schemas but identity
transformation of original data partition spec to be used in metadata table
seem to be a good option to me. Let me know what do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]