Re: Salting based on partial rowkeys

Gerald Sangudi Thu, 13 Sep 2018 15:38:40 -0700

In case the text formatting is lost below, I also added it as a comment in
the JIRA ticket:


https://issues.apache.org/jira/browse/PHOENIX-4757


On Thu, Sep 13, 2018 at 3:24 PM, Gerald Sangudi <gsang...@23andme.com>
wrote:

> Sorry I missed Josh's reply; I've subscribed to the dev list now.
>
> Below is a copy-and-paste from our internal document. Thanks in advance
> for your review and additional feedback on this.
>
> Gerald
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *BackgroundWe make extensive use of multi-column rowkeys and salting
> <https://phoenix.apache.org/salted.html> in our different apache phoenix
> deployments. We frequently perform group-by aggregations on these data
> along a specific dimension that would benefit from predictably partitioning
> the data along that dimension. Proposal:We propose to add table metadata to
> allow schema designers to constrain salting to a subset of the rowkey,
> rather than the full rowkey as it is today. This will introduce a mechanism
> to partition data on a per-table basis along a single dimension without
> application changes or much change to the phoenix runtime logic. We expect
> this will result in substantially faster group-by’s along the salted
> dimension and negligible penalties elsewhere. This feature has also been
> proposed in PHOENIX-4757
> <https://issues.apache.org/jira/browse/PHOENIX-4757> where it was pointed
> out that partitioning and sorting data along different dimensions is a
> common pattern in other datastores as well.Theoretically, it could cause
> hotspotting when querying along the salted dimension without the leading
> rowkey - that would be an anti-pattern.Usage ExampleCurrent:Schema:CREATE
> TABLE relationship (id_1 BIGINT NOT NULL,id_2 BIGINT NOT NULL,other_key
> BIGINT NOT NULL,val SMALLINT,CONSTRAINT pk PRIMARY KEY (id_1, id_2,
> other_key))SALT_BUCKETS=60;Query:Select id_2, sum(val)From
> relationshipWhere id_1 in (2,3)Group by id_2Explain:0: jdbc:phoenix:>
> EXPLAIN Select id_2, sum(val) From relationship Where id_1 in (2,3) Group
> by id_2
> ;+-----------------------------------------------------------------------------------------+--------+|
>                                          PLAN    | EST_BY
> |+-----------------------------------------------------------------------------------------+--------+|
> CLIENT 60-CHUNK PARALLEL 60-WAY SKIP SCAN ON 120 KEYS OVER RELATIONSHIP
> [0,2] - [59,3]  | null ||     SERVER AGGREGATE INTO DISTINCT ROWS BY [ID_2]
>                                       | null || CLIENT MERGE SORT
>                                                                       |
> null
> |+-----------------------------------------------------------------------------------------+--------+3
> rows selected (0.048 seconds)In this case, although the group by is
> performed on both the client and regionserver, almost all of the actual
> grouping happens on the client because the id_2’s are randomly distributed
> across the regionservers. As a result, a lot of unnecessary data is
> serialized to the client and grouped serially there. This can become quite
> material with large resultsets.Proposed:Schema:CREATE TABLE relationship
> (id_1 BIGINT NOT NULL,id_2 BIGINT NOT NULL,other_key BIGINT NOT NULL,val
> SMALLINT,CONSTRAINT pk PRIMARY KEY (id_1, id_2,
> other_key),SALT_BUCKETS=60,SALT_COLUMN = id_2);Query (unchanged):Select
> id_2, sum(val)From relationshipWhere id_1 in (2,3)Group by id_2Explain
> (unchanged)Under the proposal, the data are merely partitioned so that all
> rows containing the same id_2 are on the same regionserver, the above query
> will perform almost all of the grouping in parallel on the regionservers.
> No special hint or changes to the query plan would be required to benefit.
> Tables would need to be re-salted to take advantage of the new
> functionality.Technical changes proposed to phoenix: - Create a new piece
> of table-level metadata: SALT_COLUMN. SALT_COLUMN will instruct the salting
> logic to generate a salt-byte based only on the specified column. If
> unspecified, it will behave as it does today and default to salting the
> entire rowkey. This metadata may be specified only when the table is
> created and may not be modified. The specified column must be part of the
> rowkey.  - Modify all callers of getSaltingByte
> <https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/schema/SaltingUtil.java#L77>(byte[]
> value, int offset, int length, int bucketNum) to consistently leverage the
> new metadata.- Tests- DocsDesign points:One salt column vs multiple salt
> columns: Based on the existing signature for getSaltingByte, it seems
> simpler to only support a single SALT_COLUMN rather than multiple arbitrary
> SALT_COLUMNS. Known use-cases are completely supported by a single
> column.Syntax:  PHOENIX-4757
> <https://issues.apache.org/jira/browse/PHOENIX-4757> suggests an alternate,
> less verbose syntax for defining the salt bucket. The SALT_COLUMN syntax is
> suggested for clarity and consistency with other Phoenix table
> options.Future Enhancements (not in scope)Different aspects of the query
> execution runtime could take advantage of new metadata and implied
> knowledge that the data are partitioned in a predictable manner. For
> example: - It could be that client side grouping is completely unnecessary
> in cases where the SALT_COLUMN is part of the group-by expression.- A query
> that contains a literal equality predicate for the SALT_COLUMN can be
> isolated to a single regionserver, rather than broadcast to all
> regionservers.- A client-side merge-sort-join based on the SALT_COLUMN
> could optimize organization of merges. - Similarly, a server-side hash join
> could distribute only ‘necessary’ portions of the hash table to each
> regionserver.If additional advantages of these types come for free, then
> that’s great but can be follow on enhancements from the initial commit.*
>
> On Thu, Sep 13, 2018 at 9:33 AM, Thomas D'Silva <tdsi...@salesforce.com>
> wrote:
>
>> Gerald,
>>
>> I think you missed Josh's reply here : https://lists.apache.org/thr
>> ead.html/c5145461805429622a410c23c1199d578e146a5c94511b2d583
>> 3438b@%3Cdev.phoenix.apache.org%3E
>>
>> Could you explain how using a subset of the pk columns to generate the
>> salt byte helps with partitioning, aggregations etc?
>>
>> Thanks,
>> Thomas
>>
>> On Thu, Sep 13, 2018 at 8:32 AM, Gerald Sangudi <gsang...@23andme.com>
>> wrote:
>>
>>> Hi folks,
>>>
>>> Any thoughts or feedback on this?
>>>
>>> Thanks,
>>> Gerald
>>>
>>> On Mon, Sep 10, 2018 at 1:56 PM, Gerald Sangudi <gsang...@23andme.com>
>>> wrote:
>>>
>>>> Hello folks,
>>>>
>>>> We have a requirement for salting based on partial, rather than full,
>>>> rowkeys. My colleague Mike Polcari has identified the requirement and
>>>> proposed an approach.
>>>>
>>>> I found an already-open JIRA ticket for the same issue:
>>>> https://issues.apache.org/jira/browse/PHOENIX-4757. I can provide more
>>>> details from the proposal.
>>>>
>>>> The JIRA proposes a syntax of SALT_BUCKETS(col, ...) = N, whereas Mike
>>>> proposes SALT_COLUMN=col or SALT_COLUMNS=col, ... .
>>>>
>>>> The benefit at issue is that users gain more control over partitioning,
>>>> and this can be used to push some additional aggregations and hash joins
>>>> down to region servers.
>>>>
>>>> I would appreciate any go-ahead / thoughts / guidance / objections /
>>>> feedback. I'd like to be sure that the concept at least is not
>>>> objectionable. We would like to work on this and submit a patch down the
>>>> road. I'll also add a note to the JIRA ticket.
>>>>
>>>> Thanks,
>>>> Gerald
>>>>
>>>>
>>>
>>
>

Re: Salting based on partial rowkeys

Reply via email to