[
https://issues.apache.org/jira/browse/ACCUMULO-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447960#comment-13447960
]
Keith Turner commented on ACCUMULO-241:
---------------------------------------
FYI
I read up on UTF-8 [1] to see if it would work w/ the quoting changes I made.
It seems like UTF-8 within quotes in a visibility expression will work just
fine. So theoretically Accumulo visibility labels should support non ASCII
charsets now. I was worried that a multi-byte character may contain a quote
byte, however this will not happen w/ UTF-8. The MSB [2] is always set to 1
for each byte in a multi-byte UTF-8 encoded char. Therefore a multi-byte
characater will not contain a quote byte. When a quote byte occurs in UTF-8 it
can only be the ASCII quote char.
[1]: http://en.wikipedia.org/wiki/UTF-8
[2]: http://en.wikipedia.org/wiki/Most_significant_bit
> Visibility labels should blacklist non-ASCII characters instead of
> whitelisting select ASCII characters
> -------------------------------------------------------------------------------------------------------
>
> Key: ACCUMULO-241
> URL: https://issues.apache.org/jira/browse/ACCUMULO-241
> Project: Accumulo
> Issue Type: Improvement
> Affects Versions: 1.3.5
> Reporter: John Vines
> Labels: visibility
> Fix For: 1.3.6
>
> Attachments: ACCUMULO-241-quoting-2.txt, ACCUMULO-241-quoting.txt
>
>
> We currently whitelist our visibility labels to only allow alphanumerics and
> a few select delimiting characters. While we strive for human-readable
> labels, we should instead utilize a blacklist approach where we disallow
> parentheses, ampersands, pipes, and any non-ASCII characters. This will
> provide users with more flexibility in labeling, while still sticking to
> human readability.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira