[jira] [Comment Edited] (PHOENIX-6213) Extend Cell Tags to Delete object.

Rushabh Shah (Jira) Mon, 23 Nov 2020 10:31:19 -0800


    [ 
https://issues.apache.org/jira/browse/PHOENIX-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237581#comment-17237581
 ]


Rushabh Shah edited comment on PHOENIX-6213 at 11/23/20, 6:30 PM:
------------------------------------------------------------------

In PR ([https://github.com/apache/phoenix/pull/978]) I have used many of the 
methods from PrivateCellUtil class which is annotated IA.PRIVATE. I understand 
that we don't want downstream projects to use private annotated class. But 
PrivateCellUtil has many powerful apis that I am using for this work. 
[~gjacoby] pointed out other classes like RawCell and RawCellBuilder classes 
which are IA.LIMITEDPRIVATE classes but it  will need some additional 
processing for my use case.

For example: 

1. +PrivateCellUtil#createCell(Cell cell, List<Tag> tags)+ method has an api 
which will accept an existing Cell and list of tags to create a new cell. 

But RawCellBuilder has a builder method which doesn't have any method which 
accepts a cell. I need to explicitly convert my input cell by extracting all 
fields and use the builder methods (like setRow, setsetFamily, etc) and then 
use the build method.

 

2. +PrivateCellUtil.getTags(Cell cell)+ returns a list of existing tags which I 
want to use and add a new tag.

But RawCell#getTags() returns Iterator<Tag>  which then I have to iterate over 
them and depending on whether they are byte buffer backed or array backed, I 
need to convert them to List since RawCellBuilder#setTags accepts List of Tags. 
We are already doing this conversion in PrivateCellUtil#getTags method.

 

All these conversion utility methods needs to be duplicated in phoenix project 
also.

[~apurtell]  [~gjacoby]  To avoid these, do you think it makes sense to mark 
PrivateCellUtil as  
InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.COPROC) ? Please advise.

 


was (Author: shahrs87):
In PR (https://github.com/apache/phoenix/pull/978) I have used many of the 
methods from PrivateCellUtil class which is annotated IA.PRIVATE. I understand 
that we don't want downstream projects to use private annotated class. But 
PrivateCellUtil has many powerful apis that I am using for this work. 
[~gjacoby] pointed out other classes like RawCell and RawCellBuilder classes 
which are IA.LIMITEDPRIVATE classes but it  will need some additional 
processing for my use case.

For example: 

1. PrivateCellUtil#createCell(Cell cell, List<Tag> tags) method has an api 
which will accept an existing Cell and list of tags to create a new cell. 

But RawCellBuilder has a builder method which doesn't have any method which 
accepts a cell. I need to explicitly convert my input cell by extracting all 
fields and use the builder methods (like setRow, setsetFamily, etc) and then 
use the build method.

 

2. PrivateCellUtil.getTags(Cell cell) returns a list of existing tags which I 
want to use and add a new tag.

But RawCell#getTags() returns Iterator<Tag>  which then I have to iterate over 
them and depending on whether they are byte buffer backed or array backed, I 
need to convert them to List since RawCellBuilder#setTags accepts List of Tags. 
We are already doing this conversion in PrivateCellUtil#getTags method.

 

All these conversion utility methods needs to be duplicated in phoenix project 
also.

[~apurtell]  [~gjacoby]  To avoid these, do you think it makes sense to mark 
PrivateCellUtil as  
InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.COPROC) ? Please advise.

 

> Extend Cell Tags to Delete object.
> ----------------------------------
>
>                 Key: PHOENIX-6213
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6213
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Rushabh Shah
>            Assignee: Rushabh Shah
>            Priority: Major
>
> We want to track the source of mutations (especially Deletes) via Phoenix. We 
> have multiple use cases which does the deletes namely: customer deleting the 
> data, internal process like GDPR compliance, Phoenix TTL MR jobs. For every 
> mutations we want to track the source of operation which initiated the 
> deletes.
> At my day job, we have custom Backup/Restore tool.
> For example: During GDPR compliance cleanup (lets say at time t0), we 
> mistakenly deleted some customer data and it were possible that customer also 
> deleted some data from their side (at time t1). To recover mistakenly deleted 
> data, we restore from the backup at time (t0 - 1). By doing this, we also 
> recovered the data that customer intentionally deleted.
> We need a way for Restore tool to selectively recover data.
> Trying to explain via an example.
> Lets say there are 2 different systems (lets say accidental-delete and 
> customer-delete) deleting the data from the same table at almost the same 
> time. As the name suggest customer-delete is the intentional delete and 
> accidental-delete is deletes done by mistake. We have restore tool which will 
> restore all the data between start time and end times (start-ts and end-ts). 
> We want to restore the deletes that happened by accidental-delete system and 
> not want to restore the deletes done by customer-delete system. By adding 
> cell tag to Delete Markers, we can not restore data done by customer-delete 
> system.
> In my proposal, I want to add cell tags to Tombstone delete marker so that we 
> have that tag in the backups. Incase we have to restore data, we can restore 
> specific row depending on the tag present in the cell.
> We want to leverage Cell Tag feature for Delete mutations to store these 
> metadata. Currently Delete object doesn't support Tag feature.
> Also we want a solution that can be easily extensible to other mutations like 
> Put.
> Some of the use cases I can think of where we can use tags for Put mutations 
> are:
> 1. Identifying whether the put came from primary cluster or replicated 
> cluster so that we can make the backup tool more smarter and not backup the 
> same put twice in source and replicated cluster.
> 2. We have a multi-tenancy concept in Phoenix. We want to track whether the 
> upsert (put operation in hbase) came from Global or Tenant connection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (PHOENIX-6213) Extend Cell Tags to Delete object.

Reply via email to