[
https://issues.apache.org/jira/browse/ACCUMULO-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573081#comment-13573081
]
Christopher Tubbs commented on ACCUMULO-164:
--------------------------------------------
I'm not necessarily against the feature, but I would like to expand on the
opposition [~elserj] [mentioned|#comment-13570560] above, under the "*Against*"
section (apologies in advance for its lack of brevity):
The objection is that the basic role of column families is to create logical
groups of columns with increased locality within a row. To me, the most
intuitive application of column families as logical groups is to have a
discrete set of them. It seems to me that in applications that have continuous
variability in the column family, they could just as easily have this
variability in the column qualifier. Indeed, it seems to me that is what the
column qualifier should be used for: variability needed uniquely identify a
value within a discrete logical grouping provided by the column family.
If the column family is not used this way, then it seems to me that the column
family just gets reduced to "column element 1 that sorts after row" and the
column qualifier gets reduced to "column element 2 that sorts after column
element 1". While I realize some applications may already be using these
elements of the key in this way, don't need discrete column families, and
simply find it convenient to break up their columns into two pieces for
whatever reason, I think that these applications are breaking the basic data
model provided by the API corresponding to an Accumulo table (which is already
pretty basic to begin with).
While it's fine for these applications to break the basic data model implied by
the structured key (reducing it to "sorted key dimension 1", "sorted key
dimension 2", "sorted key dimension 3", etc.), I think that when they do, they
make it that much harder to express, with a common language, their particular
table schemas (when a row doesn't mean row in any traditional database sense at
all, when a family doesn't mean a collection of related items, when a qualifier
doesn't mean uniqueness, when a value doesn't actually get used to hold the
contents of a cell identified by the key).
I personally think that this increase in difficulty to express the intentions
and uses of any particular element of the structured key, when these intentions
become nothing more than nominative, raises the barrier to entry and makes the
API more confusing.
All that said, I think the proposed feature encourages table schemas to break
the basic data model of discrete logical groupings of related columns in a row,
and I think that existing schemas that rely on variability in the column family
could nearly as easily rely on that variability in the qualifier. I also think
the use of discrete column families is more easily expressed in documentation,
in the API, in examples, and reduces the complexity of table schemas overall.
However, I also understand that it may be very convenient to have this in many
applications (particularly those existing applications that don't want to
redefine working table schemas to take advantage of locality groups), so I'm
not necessarily against the feature. I would just like to see it, and other
instances of the basic data model implied by the structured key being broken,
as an "expert" feature, and not the norm.
> Add support for wildcards/regexes in locality group setting.
> ------------------------------------------------------------
>
> Key: ACCUMULO-164
> URL: https://issues.apache.org/jira/browse/ACCUMULO-164
> Project: Accumulo
> Issue Type: Improvement
> Components: client, master, tserver
> Reporter: John Vines
> Fix For: 1.6.0
>
>
> We should look into adding the ability to specify locality group columns as
> either wildcarding or regexes. I'm unsure of the feasibility of this, hence
> the lack of fix date.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira