[
https://issues.apache.org/jira/browse/JENA-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965782#comment-15965782
]
ASF GitHub Bot commented on JENA-1313:
--------------------------------------
GitHub user kinow opened a pull request:
https://github.com/apache/jena/pull/237
JENA-1313: compare using a Collator when both literals are tagged with same
language
Mimics the behaviour of Dydra described
[here](http://blog.dydra.com/2015/05/06/collation).
When there are strings with the same language, then instead of simply
comparing the text, it uses
[java.text.Collator](https://docs.oracle.com/javase/7/docs/api/java/text/Collator.html)
and the language locale to compare strings.
This does not create a collate:collate function as described in JENA-1313
as a possible solution, but could be still useful for users that expect the
sort results to follow the values' language collation.
Needs further tests and discussion.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kinow/jena JENA-1313-1
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/jena/pull/237.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #237
----
commit fdcfc6307d7d0f4cbd850adeeb48d3ca9300c266
Author: Bruno P. Kinoshita <[email protected]>
Date: 2017-04-12T12:44:42Z
JENA-1313: compare using a Collator when both literals are tagged with same
language
----
> Language-specific collation in ARQ
> ----------------------------------
>
> Key: JENA-1313
> URL: https://issues.apache.org/jira/browse/JENA-1313
> Project: Apache Jena
> Issue Type: Improvement
> Components: ARQ
> Affects Versions: Jena 3.2.0
> Reporter: Osma Suominen
>
> As [discussed|http://markmail.org/message/v2bvsnsza5ksl2cv] on the users
> mailing list in October 2016, I would like to change ARQ collation of literal
> values to be language-aware and respect language-specific collation rules.
> This would probably involve changing at least the
> [NodeUtils.compareLiteralsBySyntax|https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/sparql/util/NodeUtils.java#L199]
> method.
> It currently sorts by lexical value first, then by language tag. Since the
> collation order needs to be stable across all possible literal values, I
> think the safest way would be to sort by language tag first, then by lexical
> value according to the collation rules for that language.
> But what about subtags like {{@en-US}} or {{@pt-BR}}? Can they have different
> collation rules than the main language? It would be a bit strange if all
> {{@en-US}} literals sorted after {{@en}} literals...
> It would be good to check how Dydra does this and possibly take the same
> approach. See the message linked above for further backgound.
> I've been talking with [~kinow] about this and he may be interested in
> implementing it.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)