singhpk234 commented on code in PR #15184:
URL: https://github.com/apache/iceberg/pull/15184#discussion_r2747133572


##########
core/src/main/java/org/apache/iceberg/SupportsDistributedScanPlanning.java:
##########
@@ -18,5 +18,9 @@
  */
 package org.apache.iceberg;
 
-/** Marker interface to indicate whether a Table requires remote scan planning 
*/
-public interface RequiresRemoteScanPlanning {}
+/** Marker interface to indicate whether a Table supports distributed scan 
planning */

Review Comment:
   minor : logically its no longer a marker interface in a sense if Table 
implements its not suffiient we need to inspect the API below, how about in the 
doc we explain what Distributed planning means (i believe its easy to confuse 
between DistributedPlanning and Remote Planning) and then in the API below 
describe what does true and false imply, mostly thinking from POV of how 
   
   
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownLimit.java
 is modeled



##########
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -761,7 +761,8 @@ public StructType readSchema() {
   }
 
   private BatchScan newBatchScan() {
-    if (table instanceof RequiresRemoteScanPlanning) {
+    if (table instanceof SupportsDistributedScanPlanning
+        && !((SupportsDistributedScanPlanning) 
table).allowDistributedPlanning()) {
       return table.newBatchScan();
     } else if (table instanceof BaseTable && 
readConf.distributedPlanningEnabled()) {
       return new SparkDistributedDataScan(spark, table, readConf);

Review Comment:
   we would need to additionally check this as well 
   
   ```
   ((SupportsDistributedScanPlanning) table).allowDistributedPlanning())
   ```
   may be we can restructure this whole if else logic ^^
   
   
   ```
   readConf.distributedPlanningEnabled()
   ```
   
   we might need to update the docs for this as well, in a sense now this is no 
longer sufficient condition to enforce distributed planning ? i am mostly 
thinking from custom BaseTable POV



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to