rdblue commented on a change in pull request #1744:
URL: https://github.com/apache/iceberg/pull/1744#discussion_r520229507
##########
File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java
##########
@@ -95,7 +95,7 @@ public Schema record(Schema record, List<String> names,
List<Schema> fields) {
if (hasChange) {
return copyRecord(record, filteredFields);
- } else if (filteredFields.size() == record.getFields().size()) {
+ } else if (record.getFields().size() != 0 && filteredFields.size() ==
record.getFields().size()) {
Review comment:
I think we need a different operation that is the opposite of
`GetProjectedIds`, like `ProjectFromIds`.
`TypeUtil.select` uses this class, `PruneColumns`, but it has behavior like
a SQL `SELECT`. If I have a schema `a int, b struct<x double, y double>, c
string` and I select `b`, then everything underneath `b` is selected, which is
what you'd expect from `SELECT b FROM table`.
If we were to update `GetProjectedIds` with the logic above, then projecting
`b struct<>` (which you can't do by naming columns) would actually result in
the full struct getting projected because of the logic here that selects all of
`b`. This class cannot be used to reconstruct a schema using the result of
`GetProjectedIds`.
I think that we also need a `BuildProjection` that does the opposite of
`GetProjectedIds` with the update to add empty structs. Then the datum reader
could use that logic to prune the Avro schema and get an exact match with the
expected schema.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]