On Mon, Oct 21, 2013 at 02:14:32PM +0200, Martin Pelikan wrote:
> Indeed, doing collation properly (i.e. with Unicode, not just 8 bit
> characters like FreeBSD does) really is a non-trivial effort.
> > It requires some expertise in linguistics and a solid understanding
> of the unicode standard. You'd need to make use of something like ICU
> (icu-project.org) to keep your sanity, or implement a whole lot of
> that code base yourself...

Unfortunately, that requires support in the 3rd party software itself.

Anyway, I updated to current the 8 bit collation support from FreeBSD I had sent in in March and that was reviewed by Stephan in the previous Hackathon:
https://github.com/pasosdeJesus/adJ/blob/OBSD_CURRENT/arboldes/usr/src/03-cotejacion.patch

Also updated the xlocale patch, also based in FreeBSD that I sent in
February:
https://github.com/pasosdeJesus/adJ/blob/OBSD_CURRENT/arboldes/usr/src/04-xlocale-wchar.patch
https://github.com/pasosdeJesus/adJ/blob/OBSD_CURRENT/arboldes/usr/src/05-xlocale-ctype.patch
https://github.com/pasosdeJesus/adJ/blob/OBSD_CURRENT/arboldes/usr/src/06-xlocale-str.patch

And also updated patch for regress/lib/libc/locale/check_isw sent in
March:
https://github.com/pasosdeJesus/adJ/blob/OBSD_CURRENT/arboldes/usr/src/01-check_isw.patch

I want to compare in detail with the implementation that Martin sent,
few differences I see:

* I changed the sources in order to avoid runetype.h in /usr/include
* Martin implemented some pieces that I have not (e.g strtorx_l), I want to merge in what I have

> If we bother with collation I think we should try to do better.

I agree full Unicode support is desirable, but IMHO having 8 bits collation is better than nothing (for example withouth it spanish speakers have to
see ñ, á and other special symbols at the end of sorted results in
programs that support collations like PostgreSQL).

> The shim is going to be a lot less work, and doesn't preclude an
> implentation inside libc at a later stage.

Thanks for showing the right direction.  I'll look into it as soon as
I have more time; at least I know what is needed and how big is it.

Since xlocale is part of POSIX shouldn't we try to include that faster
in libc?

I think xlocale additions to POSIX were included in the Open Group Technical
Standard, 2006, Extended API Set Part 4.  That is what I understand
of the documentation of xlocale functions like strcoll_l, see bottom of:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/strcoll.html

However if Martin prefers to make a new library for this, hope the patches I
sent can be useful.

--
Dios, gracias por tu amor infinito.
-- Vladimir Támara Patiño. http://vtamara.pasosdeJesus.org/
 http://www.pasosdejesus.org/dominio_publico_colombia.html

Reply via email to