amogh-jahagirdar commented on code in PR #14867:
URL: https://github.com/apache/iceberg/pull/14867#discussion_r2683382828


##########
core/src/main/java/org/apache/iceberg/rest/RESTCatalogProperties.java:
##########
@@ -37,12 +37,107 @@ private RESTCatalogProperties() {}
 
   public static final String NAMESPACE_SEPARATOR = "namespace-separator";
 
-  // Enable planning on the REST server side
-  public static final String REST_SCAN_PLANNING_ENABLED = 
"rest-scan-planning-enabled";
-  public static final boolean REST_SCAN_PLANNING_ENABLED_DEFAULT = false;
+  // Configure scan planning mode
+  // Can be set by server in LoadTableResponse.config() or by client in 
catalog properties
+  // Negotiation rules: ONLY beats PREFERRED, both PREFERRED = client wins
+  // Default when neither client nor server provides: client-preferred
+  public static final String SCAN_PLANNING_MODE = "scan-planning-mode";
+  public static final String SCAN_PLANNING_MODE_DEFAULT =
+      ScanPlanningMode.CLIENT_PREFERRED.modeName();
 
   public enum SnapshotMode {
     ALL,
     REFS
   }
+
+  /**
+   * Enum to represent scan planning mode configuration.
+   *
+   * <p>Can be configured by:
+   *
+   * <ul>
+   *   <li>Server: Returned in LoadTableResponse.config() to advertise server 
preference/requirement
+   *   <li>Client: Set in catalog properties to set client 
preference/requirement
+   * </ul>
+   *
+   * <p>When both client and server configure this property, the values are 
negotiated:
+   *
+   * <p>Values:
+   *
+   * <ul>
+   *   <li>CLIENT_ONLY - MUST use client-side planning. Fails if paired with 
CATALOG_ONLY from other

Review Comment:
   >I am using py-iceberg, i know i am low on resources its better i just do 
remote planning if possible and the table is big and catalog can py-iceberg can 
say i prefer catalog to be planned and server based on catalog_only / 
catalog_preferred can have that negotiation.
   
   Yeah I guess I'm mainly coming from the perspective that if a user is 
running PyIceberg in a low resource environment, then a user would either 
knowingly explicitly configure the client property to use remote planning, or 
PyIceberg would internally choose what planning it wants when it's optional 
(could be something simple like just do client planning, could be heuristics 
based, it's all up to client implementations). 
   
   It's nice that the server could use this as a dynamic mechanism to control 
planning based on the load but I think there are already mechanisms for that. A 
server could just throttle a client initiated planning, and then a client could 
fall back to using client side planning for instance. This doesn't require 
additional protocol complexity to support today (I believe).
   
   >Let say i am spark and i have big compute infra, but i based on the current 
workload,
   lets say a lot of concurrent queries env, I will not have a lot of memory 
available to plan this, i would start with saying i prefer catalog
   let say i have dedicated cluster rather than doing remote plan i would do it 
in my JVM, i would say client_only from the client side
   
   Yeah, same principle as the PyIceberg case imo, I feel like in these 
circumstances a user would either explicitly configure stuff, and if we need a 
little bit more dynamism based on server/client load, we'd build that logic 
directly in the client without specing out preferences.
   
   As far as I can tell, the main benefit of codifying preferences in the spec 
is that it standardizes client behavior when the endpoint is optional but not 
required (i.e. we know exactly what PyIceberg, Java, Rust etc would do in this 
situation given some combination of options in that matrix). With my approach, 
there'd be deviation in client behavior across different implementations, but I 
personally think that's kind of an advantage in this case.
   
   I personally don't feel like that's super useful but as I said, I'm willing 
to move forward here since I guess these additional options aren't _that_ 
complicated for clients to implement and there's some level of benefit I can 
see to standardizing behavior across clients.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to