[ 
https://issues.apache.org/jira/browse/PHOENIX-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16686892#comment-16686892
 ] 

Gerald Sangudi commented on PHOENIX-4757:
-----------------------------------------

[~jamestaylor] - I'm working through your questions. Meanwhile, here's another 
scenario:
{code:sql}
CREATE TABLE table1 (
id1 BIGINT NOT NULL,
id2 BIGINT NOT NULL,
val SMALLINT
)
CONSTRAINT pk PRIMARY KEY (id1, id2)
SALT_BUCKETS=16;

SELECT *
FROM table1
WHERE id1 = 5;
{code}
If the salting is on (id1, id2), then all 16 regions must be scanned, and all 
rows in those regions must be examined for id1=5.

If the salting is on (id1), then (a) we can compute the salt value sv(5) for 
id1=5, and (b) we can scan only regions and rows beginning with [salt=sv(5), 
id1=5]. Wouldn't this be beneficial?

I realize we could avoid the salting altogether, but suppose it is needed for 
other reasons, e.g. writes.

> composite key salt_buckets
> --------------------------
>
>                 Key: PHOENIX-4757
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4757
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.11.0
>            Reporter: cmd
>            Priority: Major
>             Fix For: 4.11.0
>
>
> CREATE TABLE IF NOT EXISTS user_events (
>  user_id VARCHAR NOT NULL,
>  event_type VARCHAR NOT NULL,
>  event_time VARCHAR NOT NULL
>  event_msg VARCHAR NOT NULL
>  event_status VARCHAR NOT NULL
>  event_opt VARCHAR NOT NULL
>  CONSTRAINT my_pk PRIMARY KEY (user_id,event_type,event_time)) 
> SALT_BUCKETS=128;
> and my query is:
>  1.select event_type,count(0) from us_population where user_id='xxxx' group 
> by event_type
>  2.select count(0) from us_population where user_id='xxxx' and 
> event_type='0101'
>  3.select * from us_population where user_id='xxxx' and event_type='0101' and 
> event_time>'20180101' and event_time<'20180201' order by event_time limit 
> 50,100
> Concurrency query ratio:
>  1:80%
>  2:10%
>  3:10% 
>  user_events data :50billion
>  It can be a field/some fileds of the primary key salted by hash
>  grammar with "SALT_BUCKETS(user_id)=4" or 
> "SALT_BUCKETS(user_id,event_type)=4"
> ref:
>  
> [https://www.safaribooksonline.com/library/view/greenplum-architecture/9781940540337/xhtml/chapter03.xhtml]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to