[
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15820820#comment-15820820
]
Stian Soiland-Reyes commented on COMMONSRDF-51:
-----------------------------------------------
[BCP47 section 2.1.1|https://tools.ietf.org/html/bcp47#section-2.1.1] also
clearly states case has no meaning, just convention. Therefore we should
probably try to preserve the casing, but not use it for comparison:
{quote}
At all times, language tags and their subtags, including private use
and extensions, are to be treated as case insensitive: there exist
conventions for the capitalization of some of the subtags, but these
MUST NOT be taken to carry meaning.
Thus, the tag "mn-Cyrl-MN" is not distinct from "MN-cYRL-mn" or "mN-
cYrL-Mn" (or any other combination), and each of these variations
conveys the same meaning: Mongolian written in the Cyrillic script as
used in Mongolia.
The ABNF syntax also does not distinguish between upper- and
lowercase: the uppercase US-ASCII letters in the range 'A' through
'Z' are always considered equivalent and mapped directly to their US-
ASCII lowercase equivalents in the range 'a' through 'z'. So the tag
"I-AMI" is considered equivalent to that value "i-ami" in the
'irregular' production.
{quote}
I'll push the branch as a pull request and make bugs for the Turkish issue
upstream.
> RDF-1.1 specifies that language tags need to be compared using lower-case
> -------------------------------------------------------------------------
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
> Issue Type: Bug
> Components: api
> Affects Versions: 0.3.0
> Reporter: Peter Ansell
> Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language
> tags is
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which
> does not conflict with the case-insensitive specification in BCP47. The
> Literal.equals and Literal.hashCode API contracts should specify that
> language tags must be compared using lowercase, even if they are otherwise
> stored and returned as upper-case by getLanguageTag. The API currently has
> incorrect language by saying "character-by-character" for language tag
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known
> example where lowercase and uppercase do not roundtrip as expected for
> US-ASCII characters is Turkish [1]), so I would recommend actually stating
> that .toLowerCase(Locale.ENGLISH) is used.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)