[ 
https://issues.apache.org/jira/browse/HBASE-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701113#comment-13701113
 ] 

Lars Hofhansl commented on HBASE-8753:
--------------------------------------

Looked at v2. Three comments:
1.
{code}
+      } else if (type == KeyValue.Type.DeleteFamilyVersion.getCode()) {
+        if (familyVersionStamps.isEmpty()) {
+          familyVersionStamps.add(timestamp);
+        } else {
+          long minTimeStamp= familyVersionStamps.first();
+          assert timestamp <= minTimeStamp : "deleteFamilyStamp " + 
minTimeStamp+
+            " followed by a bigger one " + timestamp;
+
+          // remove duplication(ignore deleteFamilyVersion with same timestamp)
+          if (timestamp < minTimeStamp) {
+            familyVersionStamps.add(timestamp);
+          }
+        }
+        return;
{code}
This all seems overkill just to check the correct sort order. Can just do
{code}
+      } else if (type == KeyValue.Type.DeleteFamilyVersion.getCode()) {
+        familyVersionStamps.add(timestamp);
+        return;
{code}

2.
Also, if there is a normal family delete marker with a timestamp newer than the 
family version marker, we do not need to store the version delete marker at all 
(as the row is already targeted for delete).

3.
Lastly, this is the only delete marker type for which multiple ones need to be 
kept in memory during a scan... There can never be more than one family, 
column, or version delete marker that need to be tracked, but for the family 
version marker we need to potentially track arbitrarily many. That *is* a 
concern.
                
> Provide new delete flag which can delete all cells under a column-family 
> which have a same designated timestamp
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-8753
>                 URL: https://issues.apache.org/jira/browse/HBASE-8753
>             Project: HBase
>          Issue Type: New Feature
>          Components: Deletes, Scanners
>    Affects Versions: 0.95.1
>            Reporter: Feng Honghua
>            Assignee: Feng Honghua
>         Attachments: 8753-trunk-V2.patch, HBASE-8753-0.94-V0.patch, 
> HBASE-8753-trunk-V0.patch, HBASE-8753-trunk-V1.patch
>
>
> In one of our production scenario (Xiaomi message search), multiple cells 
> will be put in batch using a same timestamp with different column names under 
> a specific column-family. 
> And after some time these cells also need to be deleted in batch by given a 
> specific timestamp. But the column names are parsed tokens which can be 
> arbitrary words , so such batch delete is impossible without first retrieving 
> all KVs from that CF and get the column name list which has KV with that 
> given timestamp, and then issuing individual deleteColumn for each column in 
> that column-list.
> Though it's possible to do such batch delete, its performance is poor, and 
> customers also find their code is quite clumsy by first retrieving and 
> populating the column list and then issuing a deleteColumn for each column in 
> that column-list.
> This feature resolves this problem by introducing a new delete flag: 
> DeleteFamilyVersion. 
>   1). When you need to delete all KVs under a column-family with a given 
> timestamp, just call Delete.deleteFamilyVersion(cfName, timestamp); only a 
> DeleteFamilyVersion type KV is put to HBase (like DeleteFamily / DeleteColumn 
> / Delete) without read operation;
>   2). Like other delete types, DeleteFamilyVersion takes effect in 
> get/scan/flush/compact operations, the ScanDeleteTracker now parses out and 
> uses DeleteFamilyVersion to prevent all KVs under the specific CF which has 
> the same timestamp as the DeleteFamilyVersion KV to pop-up as part of a 
> get/scan result (also in flush/compact).
> Our customers find this feature efficient, clean and easy-to-use since it 
> does its work without knowing the exact column names list that needs to be 
> deleted. 
> This feature has been running smoothly for a couple of months in our 
> production clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to