I see no reason each system that uses Arrow can't add their own notion of sortedness (and potentially distribution, as mentioned by Julian), but given how common the notion was I felt having some sort of standard way to encode the information might make it more useful to the broader Arrow ecosystem.
I don't have time in the near term to drive such a standardization effort, but would be happy to help with one if anyone else is interested. Andrew On Tue, May 11, 2021 at 3:19 PM Adam Hooper <a...@adamhooper.com> wrote: > Beware with collations: Collation order is not fixed. As per TR10 > <https://www.unicode.org/reports/tr10/>: > > Over time, collation order will vary: there may be fixes needed as more > > information becomes available about languages; there may be new > government > > or industry standards for the language that require changes; and finally, > > new characters added to the Unicode Standard will interleave with the > > previously-defined ones. This means that collations must be carefully > > versioned. > > > I don't know of any nice solutions. > > Postgres has plans <https://wiki.postgresql.org/wiki/Collations> to > version > collations in v13/v14. I'm a Postgres user who experienced index corruption > between collation versions, To me, Postgres' effort seems both cutting-edge > and essential. > > Enjoy life, > Adam > > -- > Adam Hooper > +1-514-882-9694 > http://adamhooper.com >