[
https://issues.apache.org/jira/browse/PHOENIX-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075513#comment-14075513
]
Jeffrey Zhong commented on PHOENIX-1112:
----------------------------------------
Thanks [~giacomotaylor] for the reveiws!
{quote}
I think we can get away with just one additional column in SYSTEM.CATALOG. I
don't think we need INDEX_NEED_PARTIALLY_REBUILD. Just use a
INDEX_DISABLE_TIMESTAMP value of 0 or null to know that a rebuild is not
necessary.
{quote}
You're right. I was thinking a use case to temporally disable rebuild of a
disabled index. If we overload the column INDEX_DISABLE_TIMESTAMP, we have to
rebuild the whole index because its value will be 0 at that point. What do you
think? I can remove INDEX_NEED_PARTIALLY_REBUILD if such use case isn't needed.
{quote}
Did you run into any issues opening a Phoenix JDBC connection from the
server-side in MetaDataRegionObserver? It would add a new dependency on the
antlr jar being available on the server-side. Plus, is everything available
from a coprocessor that we need (i.e. can it act just like an HBase client)?
{quote}
That's a good point. I can add the dependency on antlar for phone-core jar.
Each RS can act as an HBase client.
{quote}
Is the change from calling
recoveryWriter.writeAndKillYourselfOnFailure(indexUpdates) to unconditionally
calling recoveryWriter.write(indexUpdates) intentional?
{quote}
Yes, that's by intention. The reason we abort server during normal write path
is that we write updates into WAL firstly, then send them to index region
server and commit changes on current data region server. Since changes are
already in WAL and we have to roll-forward, we can only abort RS to avoid
inconsistency read between index & data region.
While in recovery, no new WAL and data region isn't online(no inconsistency
issue because only index region is online) so no need to abort data region
server again because data region is already offline anyway.
During the whole recovered edits replay, Index region at most will see one new
change nor at all. This is all right because in normal situation index region
can have one change ahead of data region. Therefore when we failed to update
index during recovery, we can just let exception bubble up to fail the data
region open and the region will be re-assigned somewhere and retry WAL edits
replay later.
{quote}
Can you please use static constants for config parameter names (define them in
QueryServices with the others) and static constants for default values (define
them in QueryServicesOptions)?
Would you mind filing a subtask to update the secondary index docs?
{quote}
Sure, let me do that. Thanks.
> Atomically rebuild index partially when index update fails
> ------------------------------------------------------------
>
> Key: PHOENIX-1112
> URL: https://issues.apache.org/jira/browse/PHOENIX-1112
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: Jeffrey Zhong
> Assignee: Jeffrey Zhong
> Attachments: Phoenix-1112.patch
>
>
> This is a short-term work around & safe approach. Currently we disable an
> index when index update failed(still possible bring down the whole cluster).
> After an index is disable, human needs to be involved to rebuild entire index
> which maybe not ideal.
> The patch adds the support to automatically rebuild an disable index
> partially from where it failed. In addition, it removes RS abort during WAL
> recovery to prevent chain failures because we don't have to.
> To disable automatically rebuilding failed index, add the following
> configuration into hbase-site.xml:
> {noformat}
> <property>
> <name>phoenix.index.failure.handling.rebuild</name>
> <value>false</value>
> </property>
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)