[ 
https://issues.apache.org/jira/browse/PHOENIX-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675727#comment-16675727
 ] 

Gerald Sangudi commented on PHOENIX-4757:
-----------------------------------------

[~lhofhansl], thanks for your questions. My thoughts--
 # This new partitioning can be restricted to primary tables, and not allowed 
for secondary indexes. As for queries that use a secondary index, the 
requirement is that secondary indexes are able to locate the primary record. 
Presumably today, the secondary index asks the primary table to retrieve a row, 
and the region and salting information is used to locate the right region.
 # I think we would disable automatic splitting and merging.
 # I think we would disable automatic splitting and merging. The main goal / 
benefit of this proposal is to be able to determine regions based on the 
partitioning.
 # Noted. Syntax TBD from the consensus feedback.
 # Good point. But hash partitioning is a fact of life at scale. The 
partitioning might support a whole set of queries. For the other (read) queries 
that do not match the partitioning, the performance should be the same as it is 
in Phoenix today, i.e. scan and merge multiple regions.

> composite key salt_buckets
> --------------------------
>
>                 Key: PHOENIX-4757
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4757
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.11.0
>            Reporter: cmd
>            Priority: Major
>             Fix For: 4.11.0
>
>
> CREATE TABLE IF NOT EXISTS user_events (
>  user_id VARCHAR NOT NULL,
>  event_type VARCHAR NOT NULL,
>  event_time VARCHAR NOT NULL
>  event_msg VARCHAR NOT NULL
>  event_status VARCHAR NOT NULL
>  event_opt VARCHAR NOT NULL
>  CONSTRAINT my_pk PRIMARY KEY (user_id,event_type,event_time)) 
> SALT_BUCKETS=128;
> and my query is:
>  1.select event_type,count(0) from us_population where user_id='xxxx' group 
> by event_type
>  2.select count(0) from us_population where user_id='xxxx' and 
> event_type='0101'
>  3.select * from us_population where user_id='xxxx' and event_type='0101' and 
> event_time>'20180101' and event_time<'20180201' order by event_time limit 
> 50,100
> Concurrency query ratio:
>  1:80%
>  2:10%
>  3:10% 
>  user_events data :50billion
>  It can be a field/some fileds of the primary key salted by hash
>  grammar with "SALT_BUCKETS(user_id)=4" or 
> "SALT_BUCKETS(user_id,event_type)=4"
> ref:
>  
> [https://www.safaribooksonline.com/library/view/greenplum-architecture/9781940540337/xhtml/chapter03.xhtml]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to