alexeykudinkin commented on a change in pull request #4880:
URL: https://github.com/apache/hudi/pull/4880#discussion_r819910810



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/DeleteKey.java
##########
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import java.util.Objects;
+
+/**
+ * Delete key is a combination of HoodieKey and ordering value.
+ * The key is used for {@link 
org.apache.hudi.common.table.log.block.HoodieDeleteBlock}
+ * to support per-record deletions. The deletion block is always appended 
after the data block,
+ * we need to keep the ordering val to combine with the data records when 
merging, or the data may
+ * be dropped if there are intermediate deletions for the inputs
+ * (a new INSERT comes after a DELETE in one input batch).
+ */
+public class DeleteKey extends HoodieKey {

Review comment:
       > Can you just go over with the work flow in detail of COW and MOR 
combining process first ? The DELETE records encode/decode for MOR table is 
always there for efficiency. And i didn't introduce new diverge because it is 
designed there before. This PR only tag the old delete keys with version number 
and fix the event time semantics,
   
   Of course I'm aware of the merging process in both COW and MOR. Efficiency 
of DELETE block is not in the fact that we just store keys for the records 
we're planning to delete, right? 
   
   > And i didn't introduce new diverge because it is designed there before. 
This PR only tag the old delete keys with version number and fix the event time 
semantics,
   
   I've miscommunicated that -- what i meant was that, adding `DeleteKey` makes 
it further diverge from the COW implementation.
   
   > +1 on providing concrete suggestions to move forward.
   
   I've outlined concrete suggestion in the very first comments -- to fill in 
`orderingVal` (as well as potentially additional payload in the future) in a 
proper `RecordPayload` casing, so that we store not only keys of the deleted 
records but also a payload with meta information (`orderingVal` for now). 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to