I see no reason each system that uses Arrow can't add their own notion of
sortedness (and potentially distribution, as mentioned by Julian), but
given how common the notion was I felt having some sort of standard way to
encode the information might make it more useful to the broader Arrow
ecosystem.

I don't have time in the near term to drive such a standardization effort,
but would be happy to help with one if anyone else is interested.

Andrew

On Tue, May 11, 2021 at 3:19 PM Adam Hooper <a...@adamhooper.com> wrote:

> Beware with collations: Collation order is not fixed. As per TR10
> <https://www.unicode.org/reports/tr10/>:
>
> Over time, collation order will vary: there may be fixes needed as more
> > information becomes available about languages; there may be new
> government
> > or industry standards for the language that require changes; and finally,
> > new characters added to the Unicode Standard will interleave with the
> > previously-defined ones. This means that collations must be carefully
> > versioned.
>
>
> I don't know of any nice solutions.
>
> Postgres has plans <https://wiki.postgresql.org/wiki/Collations> to
> version
> collations in v13/v14. I'm a Postgres user who experienced index corruption
> between collation versions, To me, Postgres' effort seems both cutting-edge
> and essential.
>
> Enjoy life,
> Adam
>
> --
> Adam Hooper
> +1-514-882-9694
> http://adamhooper.com
>

Reply via email to