[
https://issues.apache.org/jira/browse/JENA-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15990090#comment-15990090
]
ASF GitHub Bot commented on JENA-1313:
--------------------------------------
Github user kinow commented on the issue:
https://github.com/apache/jena/pull/237
I have a sandbox project here https://github.com/kinow/jena-arq-filter, not
really unit tests, but some renamed main methods that I use for experimenting.
You can try checking out this pull request, opening in the same workspace both
projects, then trying something like this:
```
// Query String
final String queryString = "PREFIX skos:
<http://www.w3.org/2004/02/skos/core#>\n" +
"PREFIX arq: <http://jena.apache.org/ARQ/function#>\n" +
"SELECT ?label WHERE {\n" +
" VALUES ?label { \"tsahurin kieli\"@fi \"tšekin
kieli\"@fi \"tulun kieli\"@fi \"töyhtöhyyppä\"@fi }\n" +
"}\n" +
"ORDER BY arq:collation(\"fi\", ?label)";
// --- Model
Model model = ModelFactory.createDefaultModel();
// Query object
Query query = QueryFactory.create(queryString);
// Execute query
try (QueryExecution qExec = QueryExecutionFactory.create(query,
model)) {
ResultSet results = qExec.execSelect();
while (results.hasNext()) {
QuerySolution solution = results.nextSolution();
System.out.println(solution);
}
}
```
The result will be:
```
( ?label = "tsahurin kieli"@fi )
( ?label = "tšekin kieli"@fi )
( ?label = "tulun kieli"@fi )
( ?label = "töyhtöhyyppä"@fi )
```
If you change the locale for "en", then it will be:
```
( ?label = "töyhtöhyyppä"@fi )
( ?label = "tsahurin kieli"@fi )
( ?label = "tšekin kieli"@fi )
( ?label = "tulun kieli"@fi )
```
> Language-specific collation in ARQ
> ----------------------------------
>
> Key: JENA-1313
> URL: https://issues.apache.org/jira/browse/JENA-1313
> Project: Apache Jena
> Issue Type: Improvement
> Components: ARQ
> Affects Versions: Jena 3.2.0
> Reporter: Osma Suominen
>
> As [discussed|http://markmail.org/message/v2bvsnsza5ksl2cv] on the users
> mailing list in October 2016, I would like to change ARQ collation of literal
> values to be language-aware and respect language-specific collation rules.
> This would probably involve changing at least the
> [NodeUtils.compareLiteralsBySyntax|https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/sparql/util/NodeUtils.java#L199]
> method.
> It currently sorts by lexical value first, then by language tag. Since the
> collation order needs to be stable across all possible literal values, I
> think the safest way would be to sort by language tag first, then by lexical
> value according to the collation rules for that language.
> But what about subtags like {{@en-US}} or {{@pt-BR}}? Can they have different
> collation rules than the main language? It would be a bit strange if all
> {{@en-US}} literals sorted after {{@en}} literals...
> It would be good to check how Dydra does this and possibly take the same
> approach. See the message linked above for further backgound.
> I've been talking with [~kinow] about this and he may be interested in
> implementing it.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)