Peter Ebert created KUDU-2585:
---------------------------------
Summary: Custom Partitioning Schemes
Key: KUDU-2585
URL: https://issues.apache.org/jira/browse/KUDU-2585
Project: Kudu
Issue Type: New Feature
Reporter: Peter Ebert
In HBase or HDFS tables you can come up with complex key design or partitioning
(respectively) and build that logic into your application. It would be nice to
have more flexibility with Kudu beyond the range and hash options currently
provided.
One example where this would help, borrowed from the docs:
CREATE TABLE metrics (
host STRING NOT NULL,
metric STRING NOT NULL,
time INT64 NOT NULL,
value DOUBLE NOT NULL,
PRIMARY KEY (host, metric, time),
);
Now lets say these hosts to be stored in kudu are part of 2 Hadoop clusters
which I happen to indicate as part of the hostname
[c1dn1.domain.com|http://c1dn1.domain.com/] for cluster1 and
[c2dn1.domain.com|http://c2dn1.domain.com/] for cluster2. With a random hash
and enough datanodes/hosts values, I might have to read all partitions because
those will be randomly distributed.
If instead I can provide some UDF of some sort (or here even a simple substring
of the first two letters) I could group cluster1 into one or a few different
values, skipping reading any tablets for cluster 2 when I do a scan.
So instead of hash(host) it would be something like hash(substr(host, 1, 2))
but of course you could get more complex with a UDF and use the remainder of
the string to hash and mod to 10 tablets to distribute the c1 to, and so on.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)