On 5 August 2013 21:35, Cedric Blancher <[email protected]> wrote: > On 22 July 2013 16:28, Glenn Fowler <[email protected]> wrote: >> >> On Mon, 22 Jul 2013 12:10:32 +0200 Cedric Blancher wrote: >>> On 10 June 2013 03:50, Glenn Fowler <[email protected]> wrote: >>> > >>> > On Mon, 10 Jun 2013 03:47:08 +0200 Roland Mainz wrote: >>> >> On Sun, Jun 9, 2013 at 4:44 AM, Glenn Fowler <[email protected]> >>> >> wrote: >>> >> > I knew I would get into semantic trouble here >>> >> > I'm not complaining/deriding the efficacy of iswrune() >>> >> > only that it has no bearing on any posix compliant utility >>> > >>> >> OK... here is the question which bothers me: >>> >> tr -C does require to sort characters, right ? How do we sort >>> >> characters which do not have an assigned meaning ? >>> > >>> > strcoll() >>> > >>> >> > if anyone wants to start a discussion about new utility option(s) >>> >> > that rely on iswrune() and what ast utilities should be affected, great >>> >> > >>> >> > for systems that do not supply iswrune() portability remains a big >>> >> > issue, >>> >> > current practice notwithstanding -- it will always be an >>> >> > iffe|config game of catchup vs. the iw*() collection du jour >>> > >>> >> BTW: re |iswrune()| emulation... perl has the perl regex match >>> >> \p{Unassigned} ... which creates the same matches as this script >>> >> (assuming LC_ALL='en_US.UTF-8' and locales Unicode version matches the >>> >> perl unicode version): >>> >> -- snip -- >>> >> set -o nounset >>> > >>> >> typeset -i16 i >>> > >>> >> for (( i=0 ; i < 0x10FFFF ; i++ )) ; do >>> >> ch="${ printf "\u[${i/~(El)16#/}]" ; }" >>> > >>> >> if [[ "$ch" != >>> >> ~(Elr)[[:alpha:][:alnum:][:digit:][:print:][:cntrl:][:space:][:blank:][:punct:]] >>> >> ]] ; then >>> >> printf "# match found: %q\n" "${i}" >>> >> fi >>> >> done >>> > >>> >> print '# done.' >>> >> -- snip -- >>> > >>> >> |iswrune()| or not... IMO it would be nice to have something like >>> >> \p{Unassigned} in normal egrep/xgrep regex, e.g. something like a >>> >> [:_unassigned:] character class... >>> > >>> > [:rune:] would be a fine name for that class >> >>> There's still no [:rune:] emulation in libast :( >> >> that looks simple enough >> but I'm not convinced its correct >> what about system and user defined classes >> (there are notes on the list about some for chinese characters -- I forget >> the details) > > Maybe Roland can elaborate. He's an expert for such locales. > >> if those aren't handled then why provide a [:rune:] that might work maybe > > Chinese and Japanese locales have extra classes defined by the locale > data, but they are *always* "extra", i.e. the characters have matches > in the basic POSIX character classes but also match extra classes like > isphonogram() or is ideogram(). > > Please, could we get [:rune:] and a --weed-out-non-runes option for > tr(1), please? >
Please? Ced -- Cedric Blancher <[email protected]> Institute Pasteur _______________________________________________ ast-developers mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-developers
