rdblue commented on PR #6651: URL: https://github.com/apache/iceberg/pull/6651#issuecomment-1433928069
> 1.) We won't be able to really write tests for the SparkPositionDeltaWrite until the SQL conf change is in because this PR just focuses on data frame write options I think to solve this, we should add back the Spark SQL session write config. The problem I raised in [my comment](https://github.com/apache/iceberg/pull/6651#discussion_r1101931339) was only a problem for the read option. If we remove the read session config then we can avoid it. But having the write config doesn't have the same issue (because writes automatically handle cases where the branch doesn't exist) and adding it back would allow us to run the tests. > 2.) Behavior of createOrReplaceTable when the branch write option is passed in. Say for example a user does the following: > > ``` > data.writeTo("prod.db.table") > .tableProperty("write.format.default", "orc") > .partitionedBy($"level", days($"ts")) > .writeOption(SparkWriteOption.BRANCH, "branch") > .createOrReplace() > ``` > > What should the desired behavior be? Currently what happens is the write will be performed on branch and then if it's a replace the main table state will stay the same. So the `replace` loses its meaning. Another possible desired behavior is the branch is written to and replace will actually perform a replace branch to replace the head of main with the head of the target branch. This is what makes sense to me but I think it'll be hard to achieve this based on catalog/planner abstractions. I think the simplest answer is that we can just disallow `branch` with staged Spark tables. The branch should be passed through `LogicalWriteInfo` so we can disable this by adding a method to `StagedSparkTable` that is used for atomic create/replace: ```java @Override public WriteBuilder newWriteBuilder(LogicalWriteInfo info) { String branch = info.options().get(SparkWriteOptions.BRANCH); if (branch != null && !branch.equalsIgnoreCase(SnapshotRef.MAIN_BRANCH)) { throw new UnsupportedOperationException( String.format("Cannot create or replace table with branch: %s", branch)); } return super.newWriteBuilder(info); } ``` Eventually, we may want to handle this case and decide on behavior. The `create`, `replace`, and `createOrReplace` methods in `DataFrameWriterV2` completely replace the table state, including the schema and partitioning. So the correct behavior for a branch is to add the new schema, partition spec, and properties, then create a new snapshot and set the branch state to that snapshot. This would be similar to the behavior in `TableMetadata.buildReplacement`, but slightly different because it would not need to set the table's current schema and partition spec, just the ones for the branch. We should definitely not handle this right away and should instead just disable it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
