[GitHub] [iceberg] sunchao commented on a change in pull request #2276: Spark: Add option to combine tasks by partition

GitBox Fri, 26 Feb 2021 12:41:14 -0800


sunchao commented on a change in pull request #2276:
URL: https://github.com/apache/iceberg/pull/2276#discussion_r583905826




##########
File path: api/src/main/java/org/apache/iceberg/TableScan.java
##########
@@ -209,4 +209,9 @@ default TableScan select(String... columns) {
    */
   boolean isCaseSensitive();
 
+  /**
+   * Returns the target split size for this scan.
+   */
+  long targetSplitSize();

Review comment:
       Let me know if this is too pervasive. It is currently a private method 
in `BaseTableScan`. It seems both `SparkBatchQueryScan` and `SparkMergeScan` 
need to know the scan-specific split size when planning tasks. Therefore, I 
made it open.
   
   Another approach is to move all the `planTasks` logic to scan 
implementations, but for the combine tasks by partition feature, it requires 
grouping scan tasks by partition first, instead of returning them in an 
iterator fashion. I'm not sure if this is OK. `SparkMergeScan` also seems to 
re-implemented its own plan tasks logic.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] sunchao commented on a change in pull request #2276: Spark: Add option to combine tasks by partition

Reply via email to