[
https://issues.apache.org/jira/browse/PHOENIX-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237581#comment-17237581
]
Rushabh Shah edited comment on PHOENIX-6213 at 11/23/20, 6:30 PM:
------------------------------------------------------------------
In PR ([https://github.com/apache/phoenix/pull/978]) I have used many of the
methods from PrivateCellUtil class which is annotated IA.PRIVATE. I understand
that we don't want downstream projects to use private annotated class. But
PrivateCellUtil has many powerful apis that I am using for this work.
[~gjacoby] pointed out other classes like RawCell and RawCellBuilder classes
which are IA.LIMITEDPRIVATE classes but it will need some additional
processing for my use case.
For example:
1. +PrivateCellUtil#createCell(Cell cell, List<Tag> tags)+ method has an api
which will accept an existing Cell and list of tags to create a new cell.
But RawCellBuilder has a builder method which doesn't have any method which
accepts a cell. I need to explicitly convert my input cell by extracting all
fields and use the builder methods (like setRow, setsetFamily, etc) and then
use the build method.
2. +PrivateCellUtil.getTags(Cell cell)+ returns a list of existing tags which I
want to use and add a new tag.
But RawCell#getTags() returns Iterator<Tag> which then I have to iterate over
them and depending on whether they are byte buffer backed or array backed, I
need to convert them to List since RawCellBuilder#setTags accepts List of Tags.
We are already doing this conversion in PrivateCellUtil#getTags method.
All these conversion utility methods needs to be duplicated in phoenix project
also.
[~apurtell] [~gjacoby] To avoid these, do you think it makes sense to mark
PrivateCellUtil as
InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.COPROC) ? Please advise.
was (Author: shahrs87):
In PR (https://github.com/apache/phoenix/pull/978) I have used many of the
methods from PrivateCellUtil class which is annotated IA.PRIVATE. I understand
that we don't want downstream projects to use private annotated class. But
PrivateCellUtil has many powerful apis that I am using for this work.
[~gjacoby] pointed out other classes like RawCell and RawCellBuilder classes
which are IA.LIMITEDPRIVATE classes but it will need some additional
processing for my use case.
For example:
1. PrivateCellUtil#createCell(Cell cell, List<Tag> tags) method has an api
which will accept an existing Cell and list of tags to create a new cell.
But RawCellBuilder has a builder method which doesn't have any method which
accepts a cell. I need to explicitly convert my input cell by extracting all
fields and use the builder methods (like setRow, setsetFamily, etc) and then
use the build method.
2. PrivateCellUtil.getTags(Cell cell) returns a list of existing tags which I
want to use and add a new tag.
But RawCell#getTags() returns Iterator<Tag> which then I have to iterate over
them and depending on whether they are byte buffer backed or array backed, I
need to convert them to List since RawCellBuilder#setTags accepts List of Tags.
We are already doing this conversion in PrivateCellUtil#getTags method.
All these conversion utility methods needs to be duplicated in phoenix project
also.
[~apurtell] [~gjacoby] To avoid these, do you think it makes sense to mark
PrivateCellUtil as
InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.COPROC) ? Please advise.
> Extend Cell Tags to Delete object.
> ----------------------------------
>
> Key: PHOENIX-6213
> URL: https://issues.apache.org/jira/browse/PHOENIX-6213
> Project: Phoenix
> Issue Type: New Feature
> Reporter: Rushabh Shah
> Assignee: Rushabh Shah
> Priority: Major
>
> We want to track the source of mutations (especially Deletes) via Phoenix. We
> have multiple use cases which does the deletes namely: customer deleting the
> data, internal process like GDPR compliance, Phoenix TTL MR jobs. For every
> mutations we want to track the source of operation which initiated the
> deletes.
> At my day job, we have custom Backup/Restore tool.
> For example: During GDPR compliance cleanup (lets say at time t0), we
> mistakenly deleted some customer data and it were possible that customer also
> deleted some data from their side (at time t1). To recover mistakenly deleted
> data, we restore from the backup at time (t0 - 1). By doing this, we also
> recovered the data that customer intentionally deleted.
> We need a way for Restore tool to selectively recover data.
> Trying to explain via an example.
> Lets say there are 2 different systems (lets say accidental-delete and
> customer-delete) deleting the data from the same table at almost the same
> time. As the name suggest customer-delete is the intentional delete and
> accidental-delete is deletes done by mistake. We have restore tool which will
> restore all the data between start time and end times (start-ts and end-ts).
> We want to restore the deletes that happened by accidental-delete system and
> not want to restore the deletes done by customer-delete system. By adding
> cell tag to Delete Markers, we can not restore data done by customer-delete
> system.
> In my proposal, I want to add cell tags to Tombstone delete marker so that we
> have that tag in the backups. Incase we have to restore data, we can restore
> specific row depending on the tag present in the cell.
> We want to leverage Cell Tag feature for Delete mutations to store these
> metadata. Currently Delete object doesn't support Tag feature.
> Also we want a solution that can be easily extensible to other mutations like
> Put.
> Some of the use cases I can think of where we can use tags for Put mutations
> are:
> 1. Identifying whether the put came from primary cluster or replicated
> cluster so that we can make the backup tool more smarter and not backup the
> same put twice in source and replicated cluster.
> 2. We have a multi-tenancy concept in Phoenix. We want to track whether the
> upsert (put operation in hbase) came from Global or Tenant connection.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)