[
https://issues.apache.org/jira/browse/PHOENIX-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617081#comment-16617081
]
Lars Hofhansl commented on PHOENIX-4757:
----------------------------------------
This is interesting!
Couple of questions/comments:
# How do secondary indexes play into this?
# How do Phoenix' stats (equal width guideposts) affect this?
# Rows that happen to hash into a salt bucket are *not* guaranteed on a single
region server. There might be still be multiple regions for data in this salt
buckets and those will get assigned to different region servers.
# I do prefer the SALTBUCKETS(key1, key2, ...) = N syntax. That allows any
subset of key columns to be salted.
# There is at most one hash per row. That is very limiting. I doubt optimizing
for *one* query is common, and if why not design the key accordingly, or write
the same data into another table, or use a secondary index?
For example the query above could be sped up by having the key be (id_2, id_1,
...) without SALTING.
I'm not saying this is a bad idea! I'm just cautioning about what specifically
we are trying to optimize for and whether there are more general ways to
achieve that.
> composite key salt_buckets
> --------------------------
>
> Key: PHOENIX-4757
> URL: https://issues.apache.org/jira/browse/PHOENIX-4757
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 4.11.0
> Reporter: cmd
> Priority: Major
> Fix For: 4.11.0
>
>
> CREATE TABLE IF NOT EXISTS user_events (
> user_id VARCHAR NOT NULL,
> event_type VARCHAR NOT NULL,
> event_time VARCHAR NOT NULL
> event_msg VARCHAR NOT NULL
> event_status VARCHAR NOT NULL
> event_opt VARCHAR NOT NULL
> CONSTRAINT my_pk PRIMARY KEY (user_id,event_type,event_time))
> SALT_BUCKETS=128;
> and my query is:
> 1.select event_type,count(0) from us_population where user_id='xxxx' group
> by event_type
> 2.select count(0) from us_population where user_id='xxxx' and
> event_type='0101'
> 3.select * from us_population where user_id='xxxx' and event_type='0101' and
> event_time>'20180101' and event_time<'20180201' order by event_time limit
> 50,100
> Concurrency query ratio:
> 1:80%
> 2:10%
> 3:10%
> user_events data :50billion
> It can be a field/some fileds of the primary key salted by hash
> grammar with "SALT_BUCKETS(user_id)=4" or
> "SALT_BUCKETS(user_id,event_type)=4"
> ref:
>
> [https://www.safaribooksonline.com/library/view/greenplum-architecture/9781940540337/xhtml/chapter03.xhtml]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)