[
https://issues.apache.org/jira/browse/PHOENIX-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343150#comment-15343150
]
Poorna Chandra commented on PHOENIX-2993:
-----------------------------------------
bq. Does HBASE-12859 help you here at all?
During a compaction, Transaction co-processor removes invalid data based on the
invalid list contained in the latest transaction snapshot available to the
region server. There is no good way of figuring out the state of transaction
snapshot at the time a region was compacted. There could be a delay in syncing
the transaction snapshot to some of the region servers. By recording the
transaction state used during compaction of a region, we can precisely
determine what invalid data was removed.
> Tephra: Prune invalid transaction set once all data for a given invalid
> transaction has been dropped
> ----------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-2993
> URL: https://issues.apache.org/jira/browse/PHOENIX-2993
> Project: Phoenix
> Issue Type: New Feature
> Reporter: Poorna Chandra
> Assignee: Poorna Chandra
> Attachments: ApacheTephraAutomaticInvalidListPruning.pdf
>
>
> From TEPHRA-35 -
> In addition to dropping the data from invalid transactions we need to be able
> to prune the invalid set of any transactions where data cleanup has been
> completely performed. Without this, the invalid set will grow indefinitely
> and become a greater and greater cost to in-progress transactions over time.
> To do this correctly, the TransactionDataJanitor coprocessor will need to
> maintain some bookkeeping for the transaction data that it removes, so that
> the transaction manager can reason about when all of a given transaction's
> data has been removed. Only at this point can the transaction manager safely
> drop the transaction ID from the invalid set.
> One approach would be for the TransactionDataJanitor to update a table
> marking when a major compaction was performed on a region and what
> transaction IDs were filtered out. Once all regions in a table containing the
> transaction data have been compacted, we can remove the filtered out
> transaction IDs from the invalid set. However, this will need to cope with
> changing region names due to splits, etc.
> Note: This will be moved to Tephra JIRA once the setup of Tephra JIRA is
> complete (INFRA-11445)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)