openinx commented on a change in pull request #2276:
URL: https://github.com/apache/iceberg/pull/2276#discussion_r590959223
##########
File path: api/src/main/java/org/apache/iceberg/TableScan.java
##########
@@ -181,6 +181,34 @@ default TableScan select(String... columns) {
*/
CloseableIterable<CombinedScanTask> planTasks();
+ /**
+ * Create a new {@link TableScan} which indicate that when plan tasks via the
+ * {@link #planTasks()}, the scan should preserve partition boundary
specified by the provided
+ * partition column names. In other words, the scan will not attempt to
combine tasks whose input
+ * files have different partition data w.r.t `columns`.
+ *
+ * @param columns the partition column names to preserve boundary when
planning tasks
+ * @return a table scan preserving partition boundary when planning tasks
+ * @throws IllegalArgumentException if any of the input columns is not a
partition column, or
+ * if the table is unpartitioned, or `columns` is empty.
+ */
+ TableScan preservePartitions(Collection<String> columns);
Review comment:
I agreed that it's great to provide a API in `TableScan` to plan tasks
GROUP BY partitions, actually we have lots of duplicated codes in
[RewriteDataFilesAction](https://github.com/apache/iceberg/blob/e2a0ba4e34d5f5d3cc80a2a659d50651980fe1b1/core/src/main/java/org/apache/iceberg/actions/BaseRewriteDataFilesAction.java#L212)
to rewrite data files per partition. After introduced this API , I think we
could simplify the logic of Rewrite action ( In format v2, we will have more
rewrite actions).
One question is: what's the case that we want to group task by subset of
partition columns ? In my mind, we usually group the tasks by the full set of
partition columns..
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]