[GitHub] [iceberg] rdblue commented on a change in pull request #1738: Spark: Copy-on-write DELETE prototype

GitBox Mon, 09 Nov 2020 09:51:45 -0800


rdblue commented on a change in pull request #1738:
URL: https://github.com/apache/iceberg/pull/1738#discussion_r520004493




##########
File path: 
spark3/src/main/java/org/apache/spark/sql/connector/catalog/SupportsRowLevelOperations.java
##########
@@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog;
+
+import org.apache.spark.sql.connector.write.LogicalWriteInfo;
+import org.apache.spark.sql.connector.write.RowLevelOperationsBuilder;
+
+public interface SupportsRowLevelOperations extends Table {

Review comment:
       I like replace because it is in the context of Spark operations, not 
Iceberg. For Spark, this is planning an operation to replace the what was read 
and I think it describes that case well. And in Spark, overwrite is used to 
configure a normal Write operation in the builder, using a completely different 
builder seems like it should use a different name.
   
   I agree that it isn't what we use "replace" for in Iceberg. This would be an 
overwrite in Iceberg.
   
   > We still want to propagate the expression that was used to scan for 
checking concurrently appended files under serializable isolation.
   
   Okay, interesting point. So you're saying that we would use this to plan all 
row-level plans and this would be the only way to get a `DeltaWriter`? My main 
question is whether we think that is going to make it harder to build a 
`DeltaWriter` that doesn't support copy-on-write.
   
   Take a JDBC source for example. It doesn't have any reason to use 
copy-on-write because all of the operations are row-level. So would we be 
over-complicating that source just to avoid passing an extra filter to 
`DeltaWriter` that is created using a normal `WriteBuilder`?
   
   I think it would be. The simpler case for a JDBC implementation is to add 
`SupportsDelta` to `WriteBuilder` and not require a separate builder.
   
   But, we would also need an interface for passing the filter to the write for 
both `WriteBuilder` and `ReplaceBuilder`, and I agree that it is a bit cleaner 
to rely on this information coming from the scan configuration itself.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #1738: Spark: Copy-on-write DELETE prototype

Reply via email to