[
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824366#comment-15824366
]
ASF GitHub Bot commented on COMMONSRDF-51:
------------------------------------------
Github user stain commented on the issue:
https://github.com/apache/commons-rdf/pull/30
This pull request returns `getLanguageTag()` in whatever case the
underlying platform does (e.g. I think RDF4J and JSONLD-Java preserves casing,
while Jena and Simple converts to lowercase.
I think it is only in `.equals()` and `.hashCode()` we need case
insensitivity.
There's arguments both ways if we should provide a consistent view across
the implementations (e.g. always lowercase); or if we should provide a
consistency with what the underlying implementation does (e.g. if it is
preserves casing for presentation purposes).
Commons RDF don't have any value handling mechanisms now for say
converting`"13.37"^^xsd:float` to a Java float `13.37f` (without going through
the underlying implementations and related methods); or determining value
equality, so I think it is not too weird if Commons RDF doesn't do anything
clever about language tags either (beyond spec compliance).
But if someone were to add a Common RDF API for such literal value
handling, it could be natural to also add "utils" methods for presenting or
parsing language tags (e.g. `isLanguageTagEqual("en-us", "en-US")` as well as
hierarchical comparisons, something like `isSameLanguageTagFamily("en-us",
"en-GB")`
> RDF-1.1 specifies that language tags need to be compared using lower-case
> -------------------------------------------------------------------------
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
> Issue Type: Bug
> Components: api
> Affects Versions: 0.3.0
> Reporter: Peter Ansell
> Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language
> tags is
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which
> does not conflict with the case-insensitive specification in BCP47. The
> Literal.equals and Literal.hashCode API contracts should specify that
> language tags must be compared using lowercase, even if they are otherwise
> stored and returned as upper-case by getLanguageTag. The API currently has
> incorrect language by saying "character-by-character" for language tag
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known
> example where lowercase and uppercase do not roundtrip as expected for
> US-ASCII characters is Turkish [1]), so I would recommend actually stating
> that .toLowerCase(Locale.ENGLISH) is used.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)