[
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824637#comment-15824637
]
ASF GitHub Bot commented on COMMONSRDF-51:
------------------------------------------
Github user ansell commented on a diff in the pull request:
https://github.com/apache/commons-rdf/pull/30#discussion_r96309778
--- Diff: api/src/test/java/org/apache/commons/rdf/api/AbstractRDFTest.java
---
@@ -194,6 +194,114 @@ public void testCreateLiteralLangISO693_3() throws
Exception {
assertEquals("\"Herbert Van de Sompel\"@vls",
vls.ntriplesString());
}
+ public void testCreateLiteralLangCaseInsensitive() throws Exception {
+ // COMMONSRDF-51: Literal langtag may not be in lowercase, but
+ // must be COMPARED (aka .equals and .hashCode()) in lowercase
+ // as the language space is lower case.
+ final Literal lower = factory.createLiteral("Hello", "en-gb");
+ final Literal upper = factory.createLiteral("Hello", "EN-GB");
+ final Literal mixed = factory.createLiteral("Hello", "en-GB");
+
+
+ assertEquals("en-gb", lower.getLanguageTag().get());
--- End diff --
RDF4J may not follow this in some cases. It may use the BCP47 normalisation
conventions to obtain en-GB instead.
> RDF-1.1 specifies that language tags need to be compared using lower-case
> -------------------------------------------------------------------------
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
> Issue Type: Bug
> Components: api
> Affects Versions: 0.3.0
> Reporter: Peter Ansell
> Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language
> tags is
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which
> does not conflict with the case-insensitive specification in BCP47. The
> Literal.equals and Literal.hashCode API contracts should specify that
> language tags must be compared using lowercase, even if they are otherwise
> stored and returned as upper-case by getLanguageTag. The API currently has
> incorrect language by saying "character-by-character" for language tag
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known
> example where lowercase and uppercase do not roundtrip as expected for
> US-ASCII characters is Turkish [1]), so I would recommend actually stating
> that .toLowerCase(Locale.ENGLISH) is used.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)