[
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15820811#comment-15820811
]
Stian Soiland-Reyes commented on COMMONSRDF-51:
-----------------------------------------------
Yes, so both a-z and A-Z are permitted and valid in Turtle etc. However the
spec also says (and here is the ambiguity against "character by character"):
> Lexical representations of language tags may be converted to lower case.
and :
> The value space of language tags is always in lower case.
then doing a case-sensitive comparison sounds fragile, e.g. impl1 may do
lowercase (e.g. Jena) and impl2 leave them as-is (e.g. JSON-LD) - and then that
would break calls like graph.contains().
So even if my [public-rdf-comments
question|http://lists.w3.org/Archives/Public/public-rdf-comments/2017Jan/thread.html]
concludes with case sensitivity, we would probably want to make Commons RDF do
its best for lowercase comparisons anyway for consistent interoperability. In
that case perhaps we should add tests also to the graph and datasets to ensure
any "call-through" don't break literal equivalence.
> RDF-1.1 specifies that language tags need to be compared using lower-case
> -------------------------------------------------------------------------
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
> Issue Type: Bug
> Components: api
> Affects Versions: 0.3.0
> Reporter: Peter Ansell
> Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language
> tags is
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which
> does not conflict with the case-insensitive specification in BCP47. The
> Literal.equals and Literal.hashCode API contracts should specify that
> language tags must be compared using lowercase, even if they are otherwise
> stored and returned as upper-case by getLanguageTag. The API currently has
> incorrect language by saying "character-by-character" for language tag
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known
> example where lowercase and uppercase do not roundtrip as expected for
> US-ASCII characters is Turkish [1]), so I would recommend actually stating
> that .toLowerCase(Locale.ENGLISH) is used.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)