[
https://issues.apache.org/jira/browse/PHOENIX-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113226#comment-15113226
]
Thomas D'Silva commented on PHOENIX-2582:
-----------------------------------------
Attaching a possible solution from a email conversation with [~apurtell]
>In lieu of an (external) transaction manager, maybe you could run a Procedure
>that must complete before the index create is declared successful? Procedure
>is HBase's i?>internal coordination framework. HBase 0.98 and 1.0 have
>ProcedureV1. HBase 1.1+ has ProcedureV2.
>
>Your procedure workers would set the writestate on each region to readonly,
>wait for in flight writes to finish, and then join the barrier. Once inside
>the barrier your workers >could make the index related state changes, or just
>return if no further work needed. Your procedure workers would reset
>writestate in the cleanup callback. Your coordinator >(in the master) can wait
>on a monitor for global completion or poll on a completion status check. Note
>Procedures will complete in either successful or failed state. Failure >may be
>explicit (worker posted failure notice) or a timeout. If failed, you'll need
>to retry. Once one of these has completed successfully, you would be good.
> Creating an index while a batch of rows is being written leads to missing
> rows in the index table
> -------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-2582
> URL: https://issues.apache.org/jira/browse/PHOENIX-2582
> Project: Phoenix
> Issue Type: Bug
> Reporter: Thomas D'Silva
>
> If we create an index while we are upserting rows to the table its possible
> we can miss writing corresponding rows to the index table.
> If a region server is writing a batch of rows and we create an index just
> before the batch is written we will miss writing that batch to the index
> table. This is because we run the inital UPSERT SELECT to populate the index
> with an SCN that we get from the server which will be before the timestamp
> the batch of rows is written.
> We need to figure out if there is a way to determine that are pending batches
> have been written before running the UPSERT SELECT to do the initial index
> population.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)