flyrain commented on code in PR #4870: URL: https://github.com/apache/iceberg/pull/4870#discussion_r893871912
########## api/src/main/java/org/apache/iceberg/ChangelogScanTask.java: ########## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg; + +/** + * A changelog scan task. + */ +public interface ChangelogScanTask extends FileScanTask { + /** + * Returns the operation type (i.e. insert/delete). + */ + ChangelogOperation operation(); + + /** + * Returns the relative commit order in which the changes must be applied. + */ + int commitOrder(); + + /** + * Returns the snapshot ID in which the changes were committed. + */ + long commitSnapshotId(); Review Comment: What will `commitSnapshotId()` return in that case? s1 or s2, or a list of snapshot ids? I'd think the snapshot ids may not be useful if we want to calculate the net changes across snapshots. For example, `IncrementalAppendScan` could be considered as one of cases that outputs scan tasks across snapshots, which doesn't need the snapshot id. One solution is to add another layer of interface to support the net changes across snapshots. For example, we can use `ChangelogScanTask` as a parent for both cases of one snapshot and multiple snapshots, and then create a new interface `SnapshotChangelogScanTask` to extend it for use case of a single snapshot. I think the current solution is good enough for CDC implementation. But, since it is an interface change, we may want to make sure it is extendable for the future use cases. If it is something we will do to support net changes across snapshots, we may choose the right the right hierarchy and names now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
