Github user kinow commented on the issue:
https://github.com/apache/jena/pull/237
>I agree with the other commenters, the general order should be (lang, lex)
to avoid potentially inconsistent ordering.
Ack, that makes sense +1
>Also the language tag may not match any Locale. We also need to have unit
tests that verify that the code works in corner cases like this.
Sure, tests and more defensive programming will come later. Right now
looking more for comments on how to sort, where to sort, etc.
Besides typos/mispellings, there are also valid tags such as i-klingon (I
believe this is mentioned in some specification linked in the SPARQL spec
page). For cases like this I think we would simply try to match against the
JVM's available locales, and if not existing, then just use normal string
comparison.
>But what about subtags like en-US and en-GB? If the language tag is the
primary sort key, then all en-GB values would sort before "a"@en-US, which I
think would be confusing for most users.
The sort order and collation locale could be based on just the main tag (en
in this case) ignoring the subtags, but I'm quite sure there is some language
subtag out there in the world that requires a different collation order from
that of the main language...
The sort order of accented letters is different for en-CA and en-FR.
en-FR:
* cote
* coté
* côte
* côté
en-CA:
* cote
* côte
* coté
* côté
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---