rdblue commented on a change in pull request #2276:
URL: https://github.com/apache/iceberg/pull/2276#discussion_r588806632
##########
File path: api/src/main/java/org/apache/iceberg/FileScanTask.java
##########
@@ -47,6 +47,13 @@
*/
PartitionSpec spec();
+ /**
+ * The partition data for the file of this task.
Review comment:
What I'm thinking is that you might want to only restrict combining by
one or two partition fields. For example, say the query's join condition is
`a.id = b.id` and table `a` is partitioned by `day(a.ts), bucket(128, a.id)`.
Then it's fine to combine splits in different date partitions, just not in
different buckets. In that case, the splits that we produce should have a
bucket partition field, but not a date partition field.
It's a little confusing that I made this comment on `FileScanTask`, which is
always in a single partition. But the `Task` API still has the same concern: we
may want to avoid combining in a subset of partitions and those are the values
that we return through a `partition` method.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]