[
https://issues.apache.org/jira/browse/PHOENIX-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417753#comment-15417753
]
James Taylor commented on PHOENIX-3165:
---------------------------------------
I agree that mucking with the system catalog in the HBase shell is not the
right approach if it becomes corrupted. It's safest to rely on a backup/restore
mechanism to put the system catalog back into a known/good state IMHO. For
particular scenarios in which the SYSTEM.CATALOG is being updated across many
rows (such as during upgrade), I think PHOENIX-808 will be a good, simple, and
quick to implement safeguard.
Corruption can take many forms, though. I think it's important to understand
the root cause of the corruption, as IMHO prevention is the best medicine. If a
system interacts with Phoenix at the HBase level, this is very dangerous as
that system will not know if it's changing the table in an invalid manner. Best
to have Phoenix-level APIs instead that can guarantee that the system catalog
is kept in a valid state with commits being performed transactionally.
Another approach would be to have an RDBMS-style referential integrity check
(https://en.wikipedia.org/wiki/Referential_integrity) to prevent invalid states
from being entered. This would require, of course, that tools mucking with the
SYSTEM.CATALOG go through APIs that check integrity. This would be a pretty big
undertaking and it's typically the first thing that a real installation
disables because it adds too much overhead. It also wouldn't provide all the
integrity checks we need with the denormalization we do. In theory, we could
enhance our integrity checks to be able to express these and include them in
the check. This would be a very big undertaking.
bq. We should be able to drop SYSTEM.CATALOG (or any other SYSTEM table), run
the tool, and have a completely correct recreation of SYSTEM.CATALOG available
at the end of its execution.
Unfortunately, that's not possible across all the features of Phoenix:
- The SYSTEM.CATALOG has Phoenix table definitions for all tenants in the form
of views. These views are essentially unbounded - for example, a time-series
metric system such as Argus may have 10M of them. Other use cases may have
multiple per user of a system. There's no other place this information can be
retrieved or derived from.
- The SYSTEM.CATALOG may vary over time. A client can connect at an earlier
time stamp with our CURRENT_SCN capability and see the version that was in
place at that time which may be different than the latest.
- Updating HBase metadata with every change to the SYSTEM.CATALOG would put a
huge drag on the system. If we're going to do something like that, better to
change the design and keep the system-of-record in zookeeper instead.
- Because we need updates to the system catalog to have all or none commit
behavior (i.e. a DDL operation should succeed completely or on failure have
made no change), we store both column and table information in the same table
(in contiguous rows). We also store view and index metadata in the table. If
the table becomes corrupt, it'd be potentially ambiguous on how to fix it. In
theory, I suppose, a tool could let the user choose between the possible
choices it'd make to fix it.
- Since the SYSTEM.CATALOG table is essentially data, corruption may mean data
loss. You can't recover from this (other than by restoring from a backup). I
don't think guessing or default values that are loss would be viable. In
theory, the tool could ask they user what value they'd like to use, but if even
a small percentage of 10M rows are corrupt, I don't think this is feasible.
For SYSTEM.SEQUENCE corruption, we're in a similar, but more precarious
situation. If any attempts to fix sequences cause sequences to no longer be
monotonically increasing, then user data can start to be corrupted. It'd be a
bit scary to have an automated system drive this. Might need to fallback to a
manual approach here, as you might need to look at user data (and Phoenix
wouldn't know which data) to know what to reset the current value of a sequence
to.
> System table integrity check and repair tool
> --------------------------------------------
>
> Key: PHOENIX-3165
> URL: https://issues.apache.org/jira/browse/PHOENIX-3165
> Project: Phoenix
> Issue Type: New Feature
> Reporter: Andrew Purtell
> Priority: Critical
>
> When the Phoenix system tables become corrupt recovery is a painstaking
> process of low level examination of table contents and manipulation of same
> with the HBase shell. This is very difficult work providing no margin of
> safety, and is a critical gap in terms of usability.
> At the OS level, we have fsck.
> At the HDFS level, we have fsck (integrity checking only, though)
> At the HBase level, we have hbck.
> At the Phoenix level, we lack a system table repair tool.
> Implement a tool that:
> - Does not depend on the Phoenix client.
> - Supports integrity checking of SYSTEM tables. Check for the existence of
> all required columns in entries. Check that entries exist for all Phoenix
> managed tables (implies Phoenix should add supporting advisory-only metadata
> to the HBase table schemas). Check that serializations are valid.
> - Supports complete repair of SYSTEM.CATALOG and recreation, if necessary, of
> other tables like SYSTEM.STATS which can be dropped to recover from an
> emergency. We should be able to drop SYSTEM.CATALOG (or any other SYSTEM
> table), run the tool, and have a completely correct recreation of
> SYSTEM.CATALOG available at the end of its execution.
> - To the extent we have or introduce cross-system-table invariants, check
> them and offer a repair or reconstruction option.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)