singhpk234 commented on code in PR #15184:
URL: https://github.com/apache/iceberg/pull/15184#discussion_r2747133572
##########
core/src/main/java/org/apache/iceberg/SupportsDistributedScanPlanning.java:
##########
@@ -18,5 +18,9 @@
*/
package org.apache.iceberg;
-/** Marker interface to indicate whether a Table requires remote scan planning
*/
-public interface RequiresRemoteScanPlanning {}
+/** Marker interface to indicate whether a Table supports distributed scan
planning */
Review Comment:
minor : logically its no longer a marker interface in a sense if Table
implements its not suffiient we need to inspect the API below, how about in the
doc we explain what Distributed planning means (i believe its easy to confuse
between DistributedPlanning and Remote Planning) and then in the API below
describe what does true and false imply, mostly thinking from POV of how
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownLimit.java
is modeled
##########
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -761,7 +761,8 @@ public StructType readSchema() {
}
private BatchScan newBatchScan() {
- if (table instanceof RequiresRemoteScanPlanning) {
+ if (table instanceof SupportsDistributedScanPlanning
+ && !((SupportsDistributedScanPlanning)
table).allowDistributedPlanning()) {
return table.newBatchScan();
} else if (table instanceof BaseTable &&
readConf.distributedPlanningEnabled()) {
return new SparkDistributedDataScan(spark, table, readConf);
Review Comment:
we would need to additionally check this as well
```
((SupportsDistributedScanPlanning) table).allowDistributedPlanning())
```
may be we can restructure this whole if else logic ^^
```
readConf.distributedPlanningEnabled()
```
we might need to update the docs for this as well, in a sense now this is no
longer sufficient condition to enforce distributed planning ? i am mostly
thinking from custom BaseTable POV
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]