[ 
https://issues.apache.org/jira/browse/PHOENIX-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075513#comment-14075513
 ] 

Jeffrey Zhong commented on PHOENIX-1112:
----------------------------------------

Thanks [~giacomotaylor] for the reveiws!

{quote}
I think we can get away with just one additional column in SYSTEM.CATALOG. I 
don't think we need INDEX_NEED_PARTIALLY_REBUILD. Just use a 
INDEX_DISABLE_TIMESTAMP value of 0 or null to know that a rebuild is not 
necessary.
{quote}
You're right. I was thinking a use case to temporally disable rebuild of a 
disabled index. If we overload the column INDEX_DISABLE_TIMESTAMP,  we have to 
rebuild the whole index because its value will be 0 at that point. What do you 
think? I can remove INDEX_NEED_PARTIALLY_REBUILD if such use case isn't needed.

{quote}
Did you run into any issues opening a Phoenix JDBC connection from the 
server-side in MetaDataRegionObserver? It would add a new dependency on the 
antlr jar being available on the server-side. Plus, is everything available 
from a coprocessor that we need (i.e. can it act just like an HBase client)?
{quote}
That's a good point. I can add the dependency on antlar for phone-core jar. 
Each RS can act as an HBase client. 

{quote}
Is the change from calling 
recoveryWriter.writeAndKillYourselfOnFailure(indexUpdates) to unconditionally 
calling recoveryWriter.write(indexUpdates) intentional?
{quote}
Yes, that's by intention. The reason we abort server during normal write path 
is that we write updates into WAL firstly, then send them to index region 
server and commit changes on current data region server. Since changes are 
already in WAL and we have to roll-forward, we can only abort RS to avoid 
inconsistency read between index & data region. 

While in recovery, no new WAL and data region isn't online(no inconsistency 
issue because only index region is online) so no need to abort data region 
server again because data region is already offline anyway.

During the whole recovered edits replay, Index region at most will see one new 
change nor at all. This is all right because in normal situation index region 
can have one change ahead of data region. Therefore when we failed to update 
index during recovery, we can just let exception bubble up to fail the data 
region open and the region will be re-assigned somewhere and retry WAL edits 
replay later.

{quote}
Can you please use static constants for config parameter names (define them in 
QueryServices with the others) and static constants for default values (define 
them in QueryServicesOptions)?
Would you mind filing a subtask to update the secondary index docs?
{quote}
Sure, let me do that. Thanks.
  
 


> Atomically rebuild index partially when index update fails  
> ------------------------------------------------------------
>
>                 Key: PHOENIX-1112
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1112
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Jeffrey Zhong
>            Assignee: Jeffrey Zhong
>         Attachments: Phoenix-1112.patch
>
>
> This is a short-term work around & safe approach. Currently we disable an 
> index when index update failed(still possible bring down the whole cluster). 
> After an index is disable, human needs to be involved to rebuild entire index 
> which maybe not ideal.
> The patch adds the support to automatically rebuild an disable index 
> partially from where it failed. In addition, it removes RS abort during WAL 
> recovery to prevent chain failures because we don't have to.
> To disable automatically rebuilding failed index, add the following 
> configuration into hbase-site.xml:
> {noformat}
> <property>
>    <name>phoenix.index.failure.handling.rebuild</name>
>    <value>false</value>
> </property>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to