[
https://issues.apache.org/jira/browse/JENA-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16009485#comment-16009485
]
ASF GitHub Bot commented on JENA-1313:
--------------------------------------
Github user afs commented on a diff in the pull request:
https://github.com/apache/jena/pull/237#discussion_r116365710
--- Diff: jena-arq/src/main/java/org/apache/jena/sparql/expr/NodeValue.java
---
@@ -774,19 +772,10 @@ private static int compare(NodeValue nv1, NodeValue
nv2, boolean sortOrderingCom
}
case VSPACE_SORTKEY :
{
- int cmp = 0;
- String c1 = nv1.getCollation();
- String c2 = nv2.getCollation();
- if (c1 != null && c2 != null && c1.equals(c2)) {
- // locales are parsed. Here we could think about
caching if necessary
- Locale desiredLocale = Locale.forLanguageTag(c1);
- // collators are already stored in a concurrent map by
the JVM, with <locale, softref<collator>>
- Collator collator =
Collator.getInstance(desiredLocale);
- cmp = collator.compare(nv1.getString(),
nv2.getString());
- } else {
- cmp = XSDFuncOp.compareString(nv1, nv2) ;
+ if (!(nv1 instanceof NodeValueSortKey) || !(nv2 instanceof
NodeValueSortKey)) {
+ raise(new ExprNotComparableException("Can't compare
(not node value sort keys) "+nv1+" and "+nv2)) ;
}
- return cmp;
+ return ((NodeValueSortKey)
nv1).compareTo((NodeValueSortKey) nv2);
}
--- End diff --
Should that be `nv1.getSortKey()`? All the NodeValues have downcast
operations getXYZ().
> Language-specific collation in ARQ
> ----------------------------------
>
> Key: JENA-1313
> URL: https://issues.apache.org/jira/browse/JENA-1313
> Project: Apache Jena
> Issue Type: Improvement
> Components: ARQ
> Affects Versions: Jena 3.2.0
> Reporter: Osma Suominen
>
> As [discussed|http://markmail.org/message/v2bvsnsza5ksl2cv] on the users
> mailing list in October 2016, I would like to change ARQ collation of literal
> values to be language-aware and respect language-specific collation rules.
> This would probably involve changing at least the
> [NodeUtils.compareLiteralsBySyntax|https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/sparql/util/NodeUtils.java#L199]
> method.
> It currently sorts by lexical value first, then by language tag. Since the
> collation order needs to be stable across all possible literal values, I
> think the safest way would be to sort by language tag first, then by lexical
> value according to the collation rules for that language.
> But what about subtags like {{@en-US}} or {{@pt-BR}}? Can they have different
> collation rules than the main language? It would be a bit strange if all
> {{@en-US}} literals sorted after {{@en}} literals...
> It would be good to check how Dydra does this and possibly take the same
> approach. See the message linked above for further backgound.
> I've been talking with [~kinow] about this and he may be interested in
> implementing it.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)