[ 
https://issues.apache.org/jira/browse/PHOENIX-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17182130#comment-17182130
 ] 

Lars Hofhansl commented on PHOENIX-6090:
----------------------------------------

I think the problem is that
 * global indexes should not have the lock while the remote writes are 
happening (since that can cause transitive stalling of region servers)
 * if there are concurrent writes to the same rows global consistent indexes 
just accept the "corruption" (which is unverified anyway) and then let the 
read-repair handle it
 * local indexes do not have read-repair (nor should they) and they need that 
lock to be held - holding the locks for local indexes is ok since they do not 
perform any remote operations.

So we are bit at odds here. Solutions:
 * add read-repair to local indexes (I would strongly advice against that)
 * keep holding the locks if there is at least one local index involved (but 
now the global remote operation are done inside of a lock), perhaps document to 
avoid the mixing the two on the same table...?
 * (somehow) separate the write paths.
 * or figure out how to make local indexes consistent without holding locks in 
other ways.

 

> Local indexes get out of sync after changes for global consistent indexes
> -------------------------------------------------------------------------
>
>                 Key: PHOENIX-6090
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6090
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.15.0, 5.1.0, 4.16.0
>            Reporter: Lars Hofhansl
>            Assignee: Kadir OZDEMIR
>            Priority: Blocker
>             Fix For: 5.1.0, 4.15.1, 4.16.0
>
>         Attachments: 6090-test-4.x.txt, 6090-test-v2-4.x.txt
>
>
> {code:java}
>  > select /*+ NO_INDEX */ count(*) from test;
> +----------+
> | COUNT(1) |
> +----------+
> | 522244   |
> +----------+
> 1 row selected (1.213 seconds)
> > select count(*) from test;
> +---------+
> | COUNT(1) |
> +----------+
> | 522245   |
> +----------+
> 1 row selected (1.23 seconds)
> {code}
>  
> This was after I did some insert and a bunch of splits (but not in parallel).
> It's not, yet, clear under what circumstances that exactly happens. Just that 
> after a while it happens.
> This is Phoenix built from master and HBase built from branch-2.3. (Client 
> and server versions of HBase are matching).
> I've since tried with Phoenix 4.x and see the same issue - also see attached 
> tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to