[
https://issues.apache.org/jira/browse/ACCUMULO-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573687#comment-13573687
]
Christopher Tubbs commented on ACCUMULO-164:
--------------------------------------------
{quote}
The feature in its simplest sense does conflict with the original idea behind
locality groups, but is that always "bad"? I'm not sure, but it's definitely
different.
{quote}
We've already extended the original idea behind locality groups, by allowing
users to specify more than one column family for a locality group. And, I think
that is definitely not "bad" ("good", even). This is just an easier way to
select multiple families to put in a locality group, based on a common
characteristic (like common prefix).
However, I question why something like "common prefix" should be a desirable
selection mechanism for multiple families in the first place. Not only are (in
the case of the common prefix) these data naturally grouped locally without any
use of locality groups, it's not clear to me that something like "common
prefix" is the most sensible way to group related families in the general case.
I'm not sure there *is* a general case, though. Perhaps len < 4 is more useful
than identifying a common prefix for some users? Further, the only application
for this, that I can think of, is when users introduce variability into the
family that allows the number of distinct families to grow continuously (which,
I think can be, and should be, done in the qualifier instead). So, I personally
see little benefit to it (at least, for the common prefix case; though full
regexes or suffixes would certainly have greater benefit).
Maybe the most useful, and general, thing we could do to provide users the most
utility to select families for a locality group, is to allow users to inject a
user-defined hash function (maybe in JEXL?) to bin families into discrete
localities by the arbitrary method of their choosing?
{quote}
Do you have any ideas on how to present such a feature that would avoid
steering the common user toward it? Is healthy warning/documentation sufficient?
{quote}
If implemented, I think documentation should be sufficient to address all of my
concerns. The main thing is just make it clear that the feature is used to
*select* multiple column families, so that it's not implied that families with
variability *are* the same "family". The API treats non-equal families as
distinct, and that's how we should discuss them.
> Add support for wildcards/regexes in locality group setting.
> ------------------------------------------------------------
>
> Key: ACCUMULO-164
> URL: https://issues.apache.org/jira/browse/ACCUMULO-164
> Project: Accumulo
> Issue Type: Improvement
> Components: client, master, tserver
> Reporter: John Vines
> Fix For: 1.6.0
>
>
> We should look into adding the ability to specify locality group columns as
> either wildcarding or regexes. I'm unsure of the feasibility of this, hence
> the lack of fix date.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira