[ 
https://issues.apache.org/jira/browse/PHOENIX-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617081#comment-16617081
 ] 

Lars Hofhansl commented on PHOENIX-4757:
----------------------------------------

This is interesting!

Couple of questions/comments:
 # How do secondary indexes play into this?
 # How do Phoenix' stats (equal width guideposts) affect this?
 # Rows that happen to hash into a salt bucket are *not* guaranteed on a single 
region server. There might be still be multiple regions for data in this salt 
buckets and those will get assigned to different region servers.
 # I do prefer the SALTBUCKETS(key1, key2, ...) = N syntax. That allows any 
subset of key columns to be salted.
 # There is at most one hash per row. That is very limiting. I doubt optimizing 
for *one* query is common, and if why not design the key accordingly, or write 
the same data into another table, or use a secondary index?
For example the query above could be sped up by having the key be (id_2, id_1, 
...) without SALTING.

I'm not saying this is a bad idea! I'm just cautioning about what specifically 
we are trying to optimize for and whether there are more general ways to 
achieve that.

 

> composite key salt_buckets
> --------------------------
>
>                 Key: PHOENIX-4757
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4757
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.11.0
>            Reporter: cmd
>            Priority: Major
>             Fix For: 4.11.0
>
>
> CREATE TABLE IF NOT EXISTS user_events (
>  user_id VARCHAR NOT NULL,
>  event_type VARCHAR NOT NULL,
>  event_time VARCHAR NOT NULL
>  event_msg VARCHAR NOT NULL
>  event_status VARCHAR NOT NULL
>  event_opt VARCHAR NOT NULL
>  CONSTRAINT my_pk PRIMARY KEY (user_id,event_type,event_time)) 
> SALT_BUCKETS=128;
> and my query is:
>  1.select event_type,count(0) from us_population where user_id='xxxx' group 
> by event_type
>  2.select count(0) from us_population where user_id='xxxx' and 
> event_type='0101'
>  3.select * from us_population where user_id='xxxx' and event_type='0101' and 
> event_time>'20180101' and event_time<'20180201' order by event_time limit 
> 50,100
> Concurrency query ratio:
>  1:80%
>  2:10%
>  3:10% 
>  user_events data :50billion
>  It can be a field/some fileds of the primary key salted by hash
>  grammar with "SALT_BUCKETS(user_id)=4" or 
> "SALT_BUCKETS(user_id,event_type)=4"
> ref:
>  
> [https://www.safaribooksonline.com/library/view/greenplum-architecture/9781940540337/xhtml/chapter03.xhtml]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to