RussellSpitzer commented on code in PR #15372:
URL: https://github.com/apache/iceberg/pull/15372#discussion_r2833989897


##########
core/src/main/java/org/apache/iceberg/TableProperties.java:
##########
@@ -247,6 +247,15 @@ private TableProperties() {}
   public static final String DELETE_PLANNING_MODE = 
"read.delete-planning-mode";
   public static final String PLANNING_MODE_DEFAULT = 
PlanningMode.AUTO.modeName();
 
+  /**
+   * When true, declares that the table's identifier fields can be relied upon 
as a primary key by
+   * query engines for optimization purposes (e.g. eliminating redundant joins 
or distinct). This is
+   * not enforced at write time and does not validate existing data.
+   */
+  public static final String READ_IDENTIFIER_FIELDS_RELY = 
"read.identifier-fields.rely";

Review Comment:
   Specific to reads : Yes. We have in the spec that it isn't enforced and 
although we could do so, seems like a bit of a heavy ask to add here. This is 
similar to the "RELY" keyword which exists in Snowflake, Databricks Runtime and 
others. No guarantee of enforcement or validation but will tell the query 
engine to optimize based off the assumption that it's correct.
   
   For Property: I have mixed feelings here, but I think table property is the 
right way to go. For Spark, we have to define this at Catalog Load so it has to 
be an engine specific property set in the spark session, or a table property. 
Table property feels like a better fit because users probably want to enable 
this per table, It's not something that should change from application run to 
application run. Table property is also nice for cross engine compatibility 
since we can then hint to multiple engines that understand this concept.
   
   For Engine Specific: Since multiple engines support this concept, I think 
it's better to have it as a global property. We could prefix it with spark, but 
i'm not sure why we would want to do that if other engines want to take 
advantage of it. Specifically for me, I'm thinking about Snowflake. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to