Viraj Jasani created HBASE-26466:
------------------------------------
Summary: Immutable timeseries usecase - Create new region rather
than split existing one
Key: HBASE-26466
URL: https://issues.apache.org/jira/browse/HBASE-26466
Project: HBase
Issue Type: Brainstorming
Reporter: Viraj Jasani
For insertion of immutable data usecase (specifically time-series data), region
split mechanism doesn't seem to provide better availability when ingestion rate
is very high. When we ingest lot of data, the region split policy tries to
split the given hot region based on the size (either size of all stores
combined or size of any single store exceeding max file size configured) if we
consider default {_}SteppingSplitPolicy{_}. The latest hot regions tend to
receive all latest inserts. When the region is split, the first half of the
region (say daughterA) stays on the same server whereas the second half
(daughterB) region – likely to become another hot region because all new latest
updates come to second half region in the sequential write fashion – is moved
out to other servers in the cluster. Hence, once new daughter region is
created, client traffic will be redirected to another server. Client requests
will be piled up when region split is triggered till new daughters come alive
and once done, client will have to request meta for updated daughter region and
redirect traffic to new server.
If we could have configurable region creation strategy that 1) keeps the split
disabled for the given table, and 2) create new region dynamically with
lexicographically higher start key on the same server and update it's own
region boundary, the client will have to look up meta once and continue
ingestion without any degraded SLA caused by region split transitions.
Note: region split might also encounter some complications, requiring the
procedure to be rolled back from some step, or continue with internal retries,
eventually further delaying the ingestion from clients.
There are some complications around updating live region's start and end keys
as this key range is immutable. We could brainstorm ideas around making them
optionally mutable and any issues around them. For instance, client might
continue writing data to the region with updated end key but writes will fail
and hence, they will lookup in meta for updated key-space range of the table.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)