Lars George created HBASE-14864:
-----------------------------------
Summary: Add support for bucketing of keys into client library
Key: HBASE-14864
URL: https://issues.apache.org/jira/browse/HBASE-14864
Project: HBase
Issue Type: New Feature
Components: Client
Reporter: Lars George
This has been discussed and taught so many times, I believe it is time to
support it properly. The idea is to be able to assign an optional _bucketing_
strategy to a table, which translates the user given row keys into a bucketed
version. This is done by either simple count, or by parts of the key. Possibly
some simple functionality should help _compute_ bucket keys.
For example, given a key {{<service>-<epoch>-<subgroup>-...}} you could imagine
that a rule can be defined that takes the _epoch_ part and chunks it into, for
example, 5 minute buckets. This allows to store small time series together and
make reading (especially over many servers) much more efficient.
The client also supports the proper scan logic to fan a scan over the buckets
as needed. There may be an executor service (implicitly or explicitly provided)
that is used to fetch the original data with user visible ordering from the
distributed buckets.
Note that this has been attempted a few times to various extends out in the
field, but then withered away. This is an essential feature that when present
in the API will make users consider this earlier, instead of when it is too
late (when hot spotting occurs for example).
The selected bucketing strategy and settings could be stored in the table
descriptor key/value pairs. This will allow any client to observe the strategy
transparently. If not set the behaviour is the same as today, so the new
feature is not touching any critical path in terms of code, and is fully client
side. (But could be considered for say UI support as well - if needed).
The strategies are pluggable using classes, but a few default implementations
are supplied.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)