michalheld commented on code in PR #15193:
URL: https://github.com/apache/iceberg/pull/15193#discussion_r2905697077


##########
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/SparkReadConf.java:
##########
@@ -95,7 +95,12 @@ public Long snapshotId() {
   }
 
   public Long asOfTimestamp() {
-    return 
confParser.longConf().option(SparkReadOptions.AS_OF_TIMESTAMP).parseOptional();
+    SparkConfParser.LongConfParser longParser =
+        confParser.longConf().option(SparkReadOptions.AS_OF_TIMESTAMP);
+    if (snapshotId() == null && branch == null && tag() == null) {
+      longParser.sessionConf(SparkSQLProperties.AS_OF_TIMESTAMP);
+    }
+    return longParser.parseOptional();

Review Comment:
   The idea was that explicit tag/branch/snapshot/as-of-timestamp — whether as 
a table property, table ref, or in SQL — should take precedence over the 
session-level time-travel property, which would be silently ignored. Examples 
of this can be found in 
org.apache.iceberg.spark.sql.TestSelect#testSessionPropertyWithMultiTableJoin. 
   Other test cases also show that if a specific tag or branch is defined for a 
table, the session property is ignored and data is returned from the tag or 
branch snapshot. This seems like reasonable behaviour to me, but I'm open to 
other opinions.
   
   I tried removing these conditions to understand how behaviour would change. 
It turns out this method currently always returns null, because as-of-timestamp 
is always pre-resolved to snapshot-id and stripped from the options before 
SparkReadConf is constructed. Removing the guard causes a conflict between 
snapshot-id and as-of-timestamp whenever snapshot, tag, branch, or 
as-of-timestamp are specified at the table level — either in SQL or via the 
DataFrame API.
   
   This leaves us with three options:
   1. Keep the current conditions and document that the session property is 
only applied when no table-level time-travel is set, and silently ignored 
otherwise.
   2. Change the implementation so that a table-level as-of-timestamp takes 
precedence over the session property, while still failing if the session 
property is combined with a branch or tag reference. This would require a much 
more complex implementation, moving the resolution logic into the places where 
the as-of-timestamp is currently pre-resolved to snapshot-id.
   3. Fail the query when the session property is set, and the query also 
specifies a table-level snapshot, as-of-timestamp, branch, or tag. However, 
removing the conditions as-is would always produce the error "Cannot set both 
snapshot-id and as-of-timestamp to select which table snapshot to scan", since 
all table-level time-travel options pre-resolve to snapshot-id, which can be a 
confusing message in this context.
   
   I personally prefer option 1.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to