[jira] [Commented] (PHOENIX-4757) composite key salt_buckets

Gerald Sangudi (JIRA) Thu, 13 Sep 2018 15:36:39 -0700


    [ 
https://issues.apache.org/jira/browse/PHOENIX-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614129#comment-16614129
 ]


Gerald Sangudi commented on PHOENIX-4757:
-----------------------------------------

We have a similar requirement. From our discussion document:

*Background*

We make extensive use of multi-column rowkeys and 
[salting|https://phoenix.apache.org/salted.html] in our different apache 
phoenix deployments. We frequently perform group-by aggregations on these data 
along a specific dimension that would benefit from predictably partitioning the 
data along that dimension. 

 

*Proposal:*

We propose to add table metadata to allow schema designers to constrain salting 
to a subset of the rowkey, rather than the full rowkey as it is today. This 
will introduce a mechanism to partition data on a per-table basis along a 
single dimension without application changes or much change to the phoenix 
runtime logic. We expect this will result in substantially faster group-by’s 
along the salted dimension and negligible penalties elsewhere. This feature has 
also been proposed in PHOENIX-4757 where it was pointed out that partitioning 
and sorting data along different dimensions is a common pattern in other 
datastores as well.

Theoretically, it could cause hotspotting when querying along the salted 
dimension without the leading rowkey - that would be an anti-pattern.

 

*Usage Example*

*Current:*

*Schema:*

CREATE TABLE relationship (

id_1 BIGINT NOT NULL,

id_2 BIGINT NOT NULL,

other_key BIGINT NOT NULL,

val SMALLINT,

CONSTRAINT pk PRIMARY KEY (id_1, id_2, other_key)

)

SALT_BUCKETS=60;

 

*Query:*

Select id_2, sum(val)

>From relationship

Where id_1 in (2,3)

Group by id_2

 

*Explain:*

0: jdbc:phoenix:> EXPLAIN Select id_2, sum(val) From relationship Where id_1 in 
(2,3) Group by id_2 ;

+-----------------------------------------------------------------------------------------+--------+

|                                          PLAN    | EST_BY |

+-----------------------------------------------------------------------------------------+--------+

| CLIENT 60-CHUNK PARALLEL 60-WAY SKIP SCAN ON 120 KEYS OVER RELATIONSHIP [0,2] 
- [59,3]  | null |

|     SERVER AGGREGATE INTO DISTINCT ROWS BY [ID_2]                             
          | null |

| CLIENT MERGE SORT                                                             
          | null |

+-----------------------------------------------------------------------------------------+--------+

3 rows selected (0.048 seconds)

 

In this case, although the group by is performed on both the client and 
regionserver, almost all of the actual grouping happens on the client because 
the id_2’s are randomly distributed across the regionservers. As a result, a 
lot of unnecessary data is serialized to the client and grouped serially there. 
This can become quite material with large resultsets.

 

*Proposed:*

*Schema:*

CREATE TABLE relationship (

id_1 BIGINT NOT NULL,

id_2 BIGINT NOT NULL,

other_key BIGINT NOT NULL,

val SMALLINT,

CONSTRAINT pk PRIMARY KEY (id_1, id_2, other_key),

SALT_BUCKETS=60,

SALT_COLUMN = id_2

);

 

*Query (unchanged):*

Select id_2, sum(val)

>From relationship

Where id_1 in (2,3)

Group by id_2

 

*Explain (unchanged)*

 

Under the proposal, the data are merely partitioned so that all rows containing 
the same id_2 are on the same regionserver, the above query will perform almost 
all of the grouping in parallel on the regionservers. No special hint or 
changes to the query plan would be required to benefit. Tables would need to be 
re-salted to take advantage of the new functionality.

 

*Technical changes proposed to phoenix:*
 * Create a new piece of table-level metadata: SALT_COLUMN. SALT_COLUMN will 
instruct the salting logic to generate a salt-byte based only on the specified 
column. If unspecified, it will behave as it does today and default to salting 
the entire rowkey. This metadata may be specified only when the table is 
created and may not be modified. The specified column must be part of the 
rowkey.  
 * Modify all callers of 
[getSaltingByte|https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/schema/SaltingUtil.java#L77](byte[]
 value, int offset, int length, int bucketNum) to consistently leverage the new 
metadata.
 * Tests
 * Docs





*Design points:*

*One salt column vs multiple salt columns:* Based on the existing signature for 
getSaltingByte, it seems simpler to only support a single SALT_COLUMN rather 
than multiple arbitrary SALT_COLUMNS. Known use-cases are completely supported 
by a single column.

*Syntax:*  PHOENIX-4757 suggests an alternate, less verbose syntax for defining 
the salt bucket. The SALT_COLUMN syntax is suggested for clarity and 
consistency with other Phoenix table options.

 

*Future Enhancements (not in scope)*

Different aspects of the query execution runtime could take advantage of new 
metadata and implied knowledge that the data are partitioned in a predictable 
manner. For example:
 * It could be that client side grouping is completely unnecessary in cases 
where the SALT_COLUMN is part of the group-by expression.
 * A query that contains a literal equality predicate for the SALT_COLUMN can 
be isolated to a single regionserver, rather than broadcast to all 
regionservers.
 * A client-side merge-sort-join based on the SALT_COLUMN could optimize 
organization of merges. 
 * Similarly, a server-side hash join could distribute only ‘necessary’ 
portions of the hash table to each regionserver.

If additional advantages of these types come for free, then that’s great but 
can be follow on enhancements from the initial commit.

> composite key salt_buckets
> --------------------------
>
>                 Key: PHOENIX-4757
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4757
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.11.0
>            Reporter: cmd
>            Priority: Major
>             Fix For: 4.11.0
>
>
> CREATE TABLE IF NOT EXISTS user_events (
>  user_id VARCHAR NOT NULL,
>  event_type VARCHAR NOT NULL,
>  event_time VARCHAR NOT NULL
>  event_msg VARCHAR NOT NULL
>  event_status VARCHAR NOT NULL
>  event_opt VARCHAR NOT NULL
>  CONSTRAINT my_pk PRIMARY KEY (user_id,event_type,event_time)) 
> SALT_BUCKETS=128;
> and my query is:
>  1.select event_type,count(0) from us_population where user_id='xxxx' group 
> by event_type
>  2.select count(0) from us_population where user_id='xxxx' and 
> event_type='0101'
>  3.select * from us_population where user_id='xxxx' and event_type='0101' and 
> event_time>'20180101' and event_time<'20180201' order by event_time limit 
> 50,100
> Concurrency query ratio:
>  1:80%
>  2:10%
>  3:10% 
>  user_events data :50billion
>  It can be a field/some fileds of the primary key salted by hash
>  grammar with "SALT_BUCKETS(user_id)=4" or 
> "SALT_BUCKETS(user_id,event_type)=4"
> ref:
>  
> [https://www.safaribooksonline.com/library/view/greenplum-architecture/9781940540337/xhtml/chapter03.xhtml]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PHOENIX-4757) composite key salt_buckets

Reply via email to