Github user kinow commented on the issue:
https://github.com/apache/jena/pull/237
> Doesn't this run into the unstable sort issue that @afs cautioned against?
Not sure. I think not because of this approach, but I tried to find if sort
could be unstable, and think I found one case.
> I think it could be avoided by the following logic: If two
`NodeValueSortKey`s have different collation languages, sort them by the
collation languages instead of even looking at the text.
>
>This is the (lang, lex) approach discussed earlier, just applied in a
slightly different context.
Sounds like a plan. Let's wait and see what other think.
Now, on stability...
I tried finding ways that the sort would be unstable, but for two values A
and B, with same collation, the result would be stable. For two values C and D
with different collations, or missing collations, the result would be the sort
by the string literal. The node produced would be a `Node_Literal` (function
rewrites any node given to it as a `Node_Literal`).
Now here is the interesting part. `#equals(Object)` and `#hashcode()` use
the node value, i.e. the `Node_Literal` string to compare values. Using the
approach suggested by @osma `NodeValue#compare(NodeValue, NodeValue)` for
`NodeValueSortKey`("Casa", "es") and `NodeValueSortKey`("Casa", "pt") would
return that `NodeValueSortKey`("Casa", "es") < `NodeValueSortKey`("Casa",
"pt"). i.e. since both values have different collation language tags, we would
compare "es" and "pt".
However, `#equals(Object)` and `#hashcode()` would report true based only
on the `Node_Literal` node. So `NodeValueSortKey`("Casa", "es").equals(
`NodeValueSortKey`("Casa", "pt") ) would return true.
I believe this could cause problems, where the merge-sort sort would be
stable (I think), but using the elements (sorted or not) in a map/set could
result in weird behaviours...
Some code to illustrate the above stated:
```
NodeValueSortKey nvsk1 = new NodeValueSortKey("Casa", "es");
NodeValueSortKey nvsk2 = new NodeValueSortKey("Casa", "pt");
System.out.println(nvsk1.equals(nvsk2));
// true
NodeValueLang nvl1 = new NodeValueLang("Casa", "es");
NodeValueLang nvl2 = new NodeValueLang("Casa", "pt");
System.out.println(nvl1.equals(nvl2));
// false
```
For `NodeValueLang`s, when a `Node_Literal` is created, it is given a
`LiteralLabel` object that it wraps. Then, when you call
`NodeValueLang#equals(Object)`, `NodeValueLang` uses the
`LiteralLabel#equals(Object)` to compare the other `NodeValueLang`.
`LiteralLabel` is checking the language tag.
I wonder if we should create a new `Node_Concrete` implementation in
`org.apache.jena.graph` (Node_SortKey?), or if we should modify
`Node_Literal`... I feel like the latter would be less elegant than the former.
By using the current implementation, plus @osma's suggestion of comparing
the collation language tag, and finally by making sure equals/hashcode agree
with what our comparable says; then I believe we would have a stable sort.
Thoughts?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---