alexeykudinkin commented on a change in pull request #4880: URL: https://github.com/apache/hudi/pull/4880#discussion_r819910810
########## File path: hudi-common/src/main/java/org/apache/hudi/common/model/DeleteKey.java ########## @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.model; + +import java.util.Objects; + +/** + * Delete key is a combination of HoodieKey and ordering value. + * The key is used for {@link org.apache.hudi.common.table.log.block.HoodieDeleteBlock} + * to support per-record deletions. The deletion block is always appended after the data block, + * we need to keep the ordering val to combine with the data records when merging, or the data may + * be dropped if there are intermediate deletions for the inputs + * (a new INSERT comes after a DELETE in one input batch). + */ +public class DeleteKey extends HoodieKey { Review comment: > Can you just go over with the work flow in detail of COW and MOR combining process first ? The DELETE records encode/decode for MOR table is always there for efficiency. And i didn't introduce new diverge because it is designed there before. This PR only tag the old delete keys with version number and fix the event time semantics, Of course I'm aware of the merging process in both COW and MOR. Efficiency of DELETE block is not in the fact that we just store keys for the records we're planning to delete, it's in the fact that we don't rewrite the whole base file to change (potentially) just a single record, right? > And i didn't introduce new diverge because it is designed there before. This PR only tag the old delete keys with version number and fix the event time semantics, Apologies, I've miscommunicated that -- what i meant was that, adding `DeleteKey` makes it further diverge from the COW implementation. > +1 on providing concrete suggestions to move forward. I've outlined concrete suggestion in the very first comments -- to fill in `orderingVal` (as well as potentially additional payload in the future) in a proper `RecordPayload` casing, so that we store not only keys of the deleted records but also a "tombstone" payload with meta information (`orderingVal` for now). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
