[ 
https://issues.apache.org/jira/browse/ACCUMULO-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573081#comment-13573081
 ] 

Christopher Tubbs commented on ACCUMULO-164:
--------------------------------------------

I'm not necessarily against the feature, but I would like to expand on the 
opposition [~elserj] [mentioned|#comment-13570560] above, under the "*Against*" 
section (apologies in advance for its lack of brevity):

The objection is that the basic role of column families is to create logical 
groups of columns with increased locality within a row. To me, the most 
intuitive application of column families as logical groups is to have a 
discrete set of them. It seems to me that in applications that have continuous 
variability in the column family, they could just as easily have this 
variability in the column qualifier. Indeed, it seems to me that is what the 
column qualifier should be used for: variability needed uniquely identify a 
value within a discrete logical grouping provided by the column family.

If the column family is not used this way, then it seems to me that the column 
family just gets reduced to "column element 1 that sorts after row" and the 
column qualifier gets reduced to "column element 2 that sorts after column 
element 1". While I realize some applications may already be using these 
elements of the key in this way, don't need discrete column families, and 
simply find it convenient to break up their columns into two pieces for 
whatever reason, I think that these applications are breaking the basic data 
model provided by the API corresponding to an Accumulo table (which is already 
pretty basic to begin with).

While it's fine for these applications to break the basic data model implied by 
the structured key (reducing it to "sorted key dimension 1", "sorted key 
dimension 2", "sorted key dimension 3", etc.), I think that when they do, they 
make it that much harder to express, with a common language, their particular 
table schemas (when a row doesn't mean row in any traditional database sense at 
all, when a family doesn't mean a collection of related items, when a qualifier 
doesn't mean uniqueness, when a value doesn't actually get used to hold the 
contents of a cell identified by the key).

I personally think that this increase in difficulty to express the intentions 
and uses of any particular element of the structured key, when these intentions 
become nothing more than nominative, raises the barrier to entry and makes the 
API more confusing.

All that said, I think the proposed feature encourages table schemas to break 
the basic data model of discrete logical groupings of related columns in a row, 
and I think that existing schemas that rely on variability in the column family 
could nearly as easily rely on that variability in the qualifier. I also think 
the use of discrete column families is more easily expressed in documentation, 
in the API, in examples, and reduces the complexity of table schemas overall.

However, I also understand that it may be very convenient to have this in many 
applications (particularly those existing applications that don't want to 
redefine working table schemas to take advantage of locality groups), so I'm 
not necessarily against the feature. I would just like to see it, and other 
instances of the basic data model implied by the structured key being broken, 
as an "expert" feature, and not the norm.
                
> Add support for wildcards/regexes in locality group setting.
> ------------------------------------------------------------
>
>                 Key: ACCUMULO-164
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-164
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client, master, tserver
>            Reporter: John Vines
>             Fix For: 1.6.0
>
>
> We should look into adding the ability to specify locality group columns as 
> either wildcarding or regexes. I'm unsure of the feasibility of this, hence 
> the lack of fix date.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to