RussellSpitzer commented on code in PR #15372:
URL: https://github.com/apache/iceberg/pull/15372#discussion_r2833989897
##########
core/src/main/java/org/apache/iceberg/TableProperties.java:
##########
@@ -247,6 +247,15 @@ private TableProperties() {}
public static final String DELETE_PLANNING_MODE =
"read.delete-planning-mode";
public static final String PLANNING_MODE_DEFAULT =
PlanningMode.AUTO.modeName();
+ /**
+ * When true, declares that the table's identifier fields can be relied upon
as a primary key by
+ * query engines for optimization purposes (e.g. eliminating redundant joins
or distinct). This is
+ * not enforced at write time and does not validate existing data.
+ */
+ public static final String READ_IDENTIFIER_FIELDS_RELY =
"read.identifier-fields.rely";
Review Comment:
Specific to reads : Yes. We have in the spec that it isn't enforced and
although we could do so, seems like a bit of a heavy ask to add here. This is
similar to the "RELY" keyword which exists in Snowflake, Databricks Runtime and
others. No guarantee of enforcement or validation but will tell the query
engine to optimize based off the assumption that it's correct.
For Property: I have mixed feelings here, but I think table property is the
right way to go. For Spark, we have to define this at Catalog Load so it has to
be an engine specific property set in the spark session, or a table property.
Table property feels like a better fit because users probably want to enable
this per table, It's not something that should change from application run to
application run. Table property is also nice for cross engine compatibility
since we can then hint to multiple engines that understand this concept.
For Engine Specific: Since multiple engines support this concept, I think
it's better to have it as a global property. We could prefix it with spark, but
i'm not sure why we would want to do that if other engines want to take
advantage of it. Specifically for me, I'm thinking about Snowflake.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]