[
https://issues.apache.org/jira/browse/PHOENIX-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538377#comment-17538377
]
Kadir OZDEMIR commented on PHOENIX-4757:
----------------------------------------
[~larsh] What if the PK cannot be arranged because the first column in the PK
is used for multi-tenancy. It seems the discussions above focussed on one
specific use case where PK can be rearranged. Let us consider PK = (A, B, C)
where A is used for multi tenancy, B is monotonically increasing field like a
TIMESTAMP or DATE field, the values for C is random. In this case (which is
general enough to cover many similar use cases), having the SALT BUCKET
computed over A and B will address the hot spotting for writes and the
performance of the queries where A and B are specified as they would not need
to visit all salt buckets but the one corresponding to the specified A and B
value pair.
When the data table is salted, the secondary index is also salted today. The
salting is computed over the entire PK for the data table as well as the entire
PK of the index table today. PHOENIX-5259 suggest that index tables should not
inherit salting from data tables. We can allow that salting is specified for
the index table to disable or change the salting scheme. I am in favor of
implementing this Jira.
> composite key salt_buckets
> --------------------------
>
> Key: PHOENIX-4757
> URL: https://issues.apache.org/jira/browse/PHOENIX-4757
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 4.11.0
> Reporter: cmd
> Priority: Major
> Fix For: 4.11.0
>
>
> CREATE TABLE IF NOT EXISTS user_events (
> user_id VARCHAR NOT NULL,
> event_type VARCHAR NOT NULL,
> event_time VARCHAR NOT NULL
> event_msg VARCHAR NOT NULL
> event_status VARCHAR NOT NULL
> event_opt VARCHAR NOT NULL
> CONSTRAINT my_pk PRIMARY KEY (user_id,event_type,event_time))
> SALT_BUCKETS=128;
> and my query is:
> 1.select event_type,count(0) from us_population where user_id='xxxx' group
> by event_type
> 2.select count(0) from us_population where user_id='xxxx' and
> event_type='0101'
> 3.select * from us_population where user_id='xxxx' and event_type='0101' and
> event_time>'20180101' and event_time<'20180201' order by event_time limit
> 50,100
> Concurrency query ratio:
> 1:80%
> 2:10%
> 3:10%
> user_events data :50billion
> It can be a field/some fileds of the primary key salted by hash
> grammar with "SALT_BUCKETS(user_id)=4" or
> "SALT_BUCKETS(user_id,event_type)=4"
> ref:
>
> [https://www.safaribooksonline.com/library/view/greenplum-architecture/9781940540337/xhtml/chapter03.xhtml]
--
This message was sent by Atlassian Jira
(v8.20.7#820007)