Hey Gerald,
Trimming back to just dev@phoenix, but I am curious to hear some more
about what you and Mike are thinking.
Some initial questions:
* What are the problem(s) that you see today with the current
implementation of SALT_BUCKETS
* How would your new feature/proposal work?
* How would your new feature solve your current problem?
* What are the drawbacks (if any) of your new feature?
I've definitely seen a problem where folks negatively impact their reads
by "over-salting" because they were too lazy when writing data (either
to think about a good distribution or to write some code to ingest their
data).
Thanks in advance!
- Josh
On 9/10/18 4:56 PM, Gerald Sangudi wrote:
Hello folks,
We have a requirement for salting based on partial, rather than full,
rowkeys. My colleague Mike Polcari has identified the requirement and
proposed an approach.
I found an already-open JIRA ticket for the same issue:
https://issues.apache.org/jira/browse/PHOENIX-4757. I can provide more
details from the proposal.
The JIRA proposes a syntax of SALT_BUCKETS(col, ...) = N, whereas Mike
proposes SALT_COLUMN=col or SALT_COLUMNS=col, ... .
The benefit at issue is that users gain more control over partitioning,
and this can be used to push some additional aggregations and hash joins
down to region servers.
I would appreciate any go-ahead / thoughts / guidance / objections /
feedback. I'd like to be sure that the concept at least is not
objectionable. We would like to work on this and submit a patch down the
road. I'll also add a note to the JIRA ticket.
Thanks,
Gerald