aokolnychyi commented on code in PR #4708:
URL: https://github.com/apache/iceberg/pull/4708#discussion_r874140003


##########
api/src/main/java/org/apache/iceberg/actions/GenerateChangeSet.java:
##########
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+public interface GenerateChangeSet extends Action<GenerateChangeSet, 
GenerateChangeSet.Result> {
+  /**
+   * Emit changed data set by a snapshot id.
+   *
+   * @param snapshotId id of the snapshot to generate changed data
+   * @return this for method chaining
+   */
+  GenerateChangeSet forSnapshot(long snapshotId);
+
+  /**
+   * Emit changed data set for the current snapshot.
+   *
+   * @return this for method chaining
+   */
+  GenerateChangeSet forCurrentSnapshot();
+
+  /**
+   * Emit changed data from a particular snapshot(exclusive).
+   *
+   * @param fromSnapshotId id of the start snapshot
+   * @return this for method chaining
+   */
+  GenerateChangeSet afterSnapshot(long fromSnapshotId);
+
+  /**
+   * Emit change data set from the start snapshot (exclusive) to the end 
snapshot (inclusive).
+   *
+   * @param fromSnapshotId id of the start snapshot
+   * @param toSnapshotId   id of the end snapshot
+   * @return this for method chaining
+   */
+  GenerateChangeSet betweenSnapshots(long fromSnapshotId, long toSnapshotId);
+
+  /**
+   * The action result that contains a dataset of changed rows.
+   */
+  interface Result<T> {

Review Comment:
   I personally don't think the action API should be limited to table 
maintenance. When we added it, the idea was to provide solutions for common 
problems that require a query engine. Initially, those happened to be mostly 
related to table maintenance. However, I wouldn't mind actions for other 
purposes.
   
   We all agree to share the planning part across engines via a dedicated scan 
API. However, we still need to build an engine-specific representation 
somewhere. Of course, we could have a utility but I am not a big fan of 
exposing utilities to users. The action API is much more user-facing and 
requires us to think about the proper interface and keep the compatibility. We 
had many issues with Spark utils exposed to the users in the past.
   
   One thing about using the action API for CDC that concerns me is the need to 
parameterize the result. Our action API was engine agnostic so far but we need 
to return Spark `Dataset` here. I am not sure how big of a deal it is.
   
   Thoughts, @flyrain @rdblue @RussellSpitzer @szehon-ho @stevenzwu?
   
   @rdblue @stevenzwu, I think both of you had concerns about using an action. 
Could you elaborate a little bit? Given that we will have a common Scan API to 
share the planning logic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to