[
https://issues.apache.org/jira/browse/PHOENIX-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614129#comment-16614129
]
Gerald Sangudi commented on PHOENIX-4757:
-----------------------------------------
We have a similar requirement. From our discussion document:
*Background*
We make extensive use of multi-column rowkeys and
[salting|https://phoenix.apache.org/salted.html] in our different apache
phoenix deployments. We frequently perform group-by aggregations on these data
along a specific dimension that would benefit from predictably partitioning the
data along that dimension.
*Proposal:*
We propose to add table metadata to allow schema designers to constrain salting
to a subset of the rowkey, rather than the full rowkey as it is today. This
will introduce a mechanism to partition data on a per-table basis along a
single dimension without application changes or much change to the phoenix
runtime logic. We expect this will result in substantially faster group-by’s
along the salted dimension and negligible penalties elsewhere. This feature has
also been proposed in PHOENIX-4757 where it was pointed out that partitioning
and sorting data along different dimensions is a common pattern in other
datastores as well.
Theoretically, it could cause hotspotting when querying along the salted
dimension without the leading rowkey - that would be an anti-pattern.
*Usage Example*
*Current:*
*Schema:*
CREATE TABLE relationship (
id_1 BIGINT NOT NULL,
id_2 BIGINT NOT NULL,
other_key BIGINT NOT NULL,
val SMALLINT,
CONSTRAINT pk PRIMARY KEY (id_1, id_2, other_key)
)
SALT_BUCKETS=60;
*Query:*
Select id_2, sum(val)
>From relationship
Where id_1 in (2,3)
Group by id_2
*Explain:*
0: jdbc:phoenix:> EXPLAIN Select id_2, sum(val) From relationship Where id_1 in
(2,3) Group by id_2 ;
+-----------------------------------------------------------------------------------------+--------+
| PLAN | EST_BY |
+-----------------------------------------------------------------------------------------+--------+
| CLIENT 60-CHUNK PARALLEL 60-WAY SKIP SCAN ON 120 KEYS OVER RELATIONSHIP [0,2]
- [59,3] | null |
| SERVER AGGREGATE INTO DISTINCT ROWS BY [ID_2]
| null |
| CLIENT MERGE SORT
| null |
+-----------------------------------------------------------------------------------------+--------+
3 rows selected (0.048 seconds)
In this case, although the group by is performed on both the client and
regionserver, almost all of the actual grouping happens on the client because
the id_2’s are randomly distributed across the regionservers. As a result, a
lot of unnecessary data is serialized to the client and grouped serially there.
This can become quite material with large resultsets.
*Proposed:*
*Schema:*
CREATE TABLE relationship (
id_1 BIGINT NOT NULL,
id_2 BIGINT NOT NULL,
other_key BIGINT NOT NULL,
val SMALLINT,
CONSTRAINT pk PRIMARY KEY (id_1, id_2, other_key),
SALT_BUCKETS=60,
SALT_COLUMN = id_2
);
*Query (unchanged):*
Select id_2, sum(val)
>From relationship
Where id_1 in (2,3)
Group by id_2
*Explain (unchanged)*
Under the proposal, the data are merely partitioned so that all rows containing
the same id_2 are on the same regionserver, the above query will perform almost
all of the grouping in parallel on the regionservers. No special hint or
changes to the query plan would be required to benefit. Tables would need to be
re-salted to take advantage of the new functionality.
*Technical changes proposed to phoenix:*
* Create a new piece of table-level metadata: SALT_COLUMN. SALT_COLUMN will
instruct the salting logic to generate a salt-byte based only on the specified
column. If unspecified, it will behave as it does today and default to salting
the entire rowkey. This metadata may be specified only when the table is
created and may not be modified. The specified column must be part of the
rowkey.
* Modify all callers of
[getSaltingByte|https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/schema/SaltingUtil.java#L77](byte[]
value, int offset, int length, int bucketNum) to consistently leverage the new
metadata.
* Tests
* Docs
*Design points:*
*One salt column vs multiple salt columns:* Based on the existing signature for
getSaltingByte, it seems simpler to only support a single SALT_COLUMN rather
than multiple arbitrary SALT_COLUMNS. Known use-cases are completely supported
by a single column.
*Syntax:* PHOENIX-4757 suggests an alternate, less verbose syntax for defining
the salt bucket. The SALT_COLUMN syntax is suggested for clarity and
consistency with other Phoenix table options.
*Future Enhancements (not in scope)*
Different aspects of the query execution runtime could take advantage of new
metadata and implied knowledge that the data are partitioned in a predictable
manner. For example:
* It could be that client side grouping is completely unnecessary in cases
where the SALT_COLUMN is part of the group-by expression.
* A query that contains a literal equality predicate for the SALT_COLUMN can
be isolated to a single regionserver, rather than broadcast to all
regionservers.
* A client-side merge-sort-join based on the SALT_COLUMN could optimize
organization of merges.
* Similarly, a server-side hash join could distribute only ‘necessary’
portions of the hash table to each regionserver.
If additional advantages of these types come for free, then that’s great but
can be follow on enhancements from the initial commit.
> composite key salt_buckets
> --------------------------
>
> Key: PHOENIX-4757
> URL: https://issues.apache.org/jira/browse/PHOENIX-4757
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 4.11.0
> Reporter: cmd
> Priority: Major
> Fix For: 4.11.0
>
>
> CREATE TABLE IF NOT EXISTS user_events (
> user_id VARCHAR NOT NULL,
> event_type VARCHAR NOT NULL,
> event_time VARCHAR NOT NULL
> event_msg VARCHAR NOT NULL
> event_status VARCHAR NOT NULL
> event_opt VARCHAR NOT NULL
> CONSTRAINT my_pk PRIMARY KEY (user_id,event_type,event_time))
> SALT_BUCKETS=128;
> and my query is:
> 1.select event_type,count(0) from us_population where user_id='xxxx' group
> by event_type
> 2.select count(0) from us_population where user_id='xxxx' and
> event_type='0101'
> 3.select * from us_population where user_id='xxxx' and event_type='0101' and
> event_time>'20180101' and event_time<'20180201' order by event_time limit
> 50,100
> Concurrency query ratio:
> 1:80%
> 2:10%
> 3:10%
> user_events data :50billion
> It can be a field/some fileds of the primary key salted by hash
> grammar with "SALT_BUCKETS(user_id)=4" or
> "SALT_BUCKETS(user_id,event_type)=4"
> ref:
>
> [https://www.safaribooksonline.com/library/view/greenplum-architecture/9781940540337/xhtml/chapter03.xhtml]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)