[
https://issues.apache.org/jira/browse/PHOENIX-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15178452#comment-15178452
]
Enis Soztutar commented on PHOENIX-2737:
----------------------------------------
HBCK unfortunately has been used extensively in cases where there is a problem
as reported in region boundaries. How often we run into such situations? I
would say in older 0.98 code bases quite often. Recent 1.1+ is much better.
Having to rebuild the index after HBCK run is acceptable for a short term
solution, although not desired. The problem is that the severity of the
situation if a problem with region split happens, then it will not only cause
some downtime for HBase, it will cause downtime until the index rebuild is
complete. In case there is a lot of data to scan and lot of local indexes (we
have seen people trying 50-60 index columns) this will cause a long downtime.
The other thing is that no matter how good we document it etc, users will not
remember to do it consistently every time causing data to be not seen by
queries silently.
We were discussing 2 alternatives with Rajesh. Using separators for startKeys
is one, and the other is writing the startKey in every hfile.
Since region start keys are derived from actual primary keys, what you are
saying if I interpret it right is that we cannot rely on the separator byte as
is because it will be in the data itself in case varchar columns etc are used.
In this case, we need a separator for the whole start key (a concat of primary
keys), not individual columns inside. So we have to use a different escape
anyway, no?
The second solution I was suggesting is to write the start key of the region
into every hfile. This will happen in regular flush / compaction and bulk load
paths. If we can do it consistently, then we will automatically know the length
of the start key to replace consistently and we do not need to depend on the
meta to show us the split points, etc. HBASE-14511 allows to do this, but it is
not yet committed. However, if we agree that this is the correct solution, we
can even just add this field in HBase proper (don't think it will be
controversial to add a single field in hfile meta in HBase).
> Make sure local indexes work properly after fixing region overlaps by HBCK.
> ---------------------------------------------------------------------------
>
> Key: PHOENIX-2737
> URL: https://issues.apache.org/jira/browse/PHOENIX-2737
> Project: Phoenix
> Issue Type: Bug
> Reporter: Rajeshbabu Chintaguntla
> Assignee: Rajeshbabu Chintaguntla
> Fix For: 4.8.0
>
>
> When there are region overlaps hbck fix by moving hfiles of overlap regions
> to new region of common key of overlap regions. Then we might not properly
> replace region start key in HFiles in that case. In this case we don't have
> any relation of parent child region in hbase:meta so we cannot identify the
> start key in HFiles. To fix this we need to add separator after region
> start key so that we can easily identify start key in HFile without always
> touching hbase:meta. So when we create scanners for the Storefiles we can
> check the region start key in hfile with region start key and if any change
> we can just replace the old start key with current region start key. During
> compaction we can properly replace the start key with actual key values.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)