Hey Gerald,

Trimming back to just dev@phoenix, but I am curious to hear some more about what you and Mike are thinking.

Some initial questions:

* What are the problem(s) that you see today with the current implementation of SALT_BUCKETS
* How would your new feature/proposal work?
* How would your new feature solve your current problem?
* What are the drawbacks (if any) of your new feature?

I've definitely seen a problem where folks negatively impact their reads by "over-salting" because they were too lazy when writing data (either to think about a good distribution or to write some code to ingest their data).

Thanks in advance!

- Josh

On 9/10/18 4:56 PM, Gerald Sangudi wrote:
Hello folks,

We have a requirement for salting based on partial, rather than full, rowkeys. My colleague Mike Polcari has identified the requirement and proposed an approach.

I found an already-open JIRA ticket for the same issue: https://issues.apache.org/jira/browse/PHOENIX-4757. I can provide more details from the proposal.

The JIRA proposes a syntax of SALT_BUCKETS(col, ...) = N, whereas Mike proposes SALT_COLUMN=col or SALT_COLUMNS=col, ... .

The benefit at issue is that users gain more control over partitioning, and this can be used to push some additional aggregations and hash joins down to region servers.

I would appreciate any go-ahead / thoughts / guidance / objections / feedback. I'd like to be sure that the concept at least is not objectionable. We would like to work on this and submit a patch down the road. I'll also add a note to the JIRA ticket.

Thanks,
Gerald

Reply via email to