amogh-jahagirdar commented on code in PR #7593:
URL: https://github.com/apache/iceberg/pull/7593#discussion_r1192863722


##########
spark/v3.4/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateOrReplaceBranchExec.scala:
##########
@@ -41,6 +41,9 @@ case class CreateOrReplaceBranchExec(
   override protected def run(): Seq[InternalRow] = {
     catalog.loadTable(ident) match {
       case iceberg: SparkTable =>
+        if (iceberg.table.currentSnapshot() == null) {
+          throw new UnsupportedOperationException(s"The Iceberg table: 
$iceberg" + " has no snapshots")
+        }

Review Comment:
   We definitely should throw a better exception here thanks for identifying 
this issue! Few points:
   
    1.) I don't think the fix is correct since it is a valid case that there is 
no `main` branch and thus table.currentSnapshot() is null, and the user 
specifies an explicit snapshot to create from some other non-main branch. We 
don't want to throw unnecessarily
   
   So the commit graph looks something like 
   
   ```                   
   (empty main table state)
   
                        S4 (Branch "b")
                            /
   S1 -> S2 -> S3 (Branch "a")
   ```
   
   Creating branch b should still succeed but with this implementation we'll 
fail unnecessarily.
   
   So in the core library invalid snapshots are prevented here 
https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1181
   
   but this case is a bit different just because in the spark procedure we 
default to choosing the latest main snapshot if a snapshot is not specified. I 
think one simple way forward is assign snapshotId to a `Long` which can be null 
in case table.currentSnapshot() is null. Then before the API call to create the 
branch, throw our own exception in case the current snapshot is null. 
   
   2.) `UnsupportedOperationException` is not the right exception imo. I think 
we should be throwing an `IllegalArgumentException` to indicate that there is 
no latest main snapshot, and that the user should specify an explicit snapshot.
   
   3.) Could we just keep the PR focused to 3.4 and also add unit tests to 
cover this case? We can later backport and address the other DDLs



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to