On 5 August 2013 21:35, Cedric Blancher <[email protected]> wrote:
> On 22 July 2013 16:28, Glenn Fowler <[email protected]> wrote:
>>
>> On Mon, 22 Jul 2013 12:10:32 +0200 Cedric Blancher wrote:
>>> On 10 June 2013 03:50, Glenn Fowler <[email protected]> wrote:
>>> >
>>> > On Mon, 10 Jun 2013 03:47:08 +0200 Roland Mainz wrote:
>>> >> On Sun, Jun 9, 2013 at 4:44 AM, Glenn Fowler <[email protected]> 
>>> >> wrote:
>>> >> > I knew I would get into semantic trouble here
>>> >> > I'm not complaining/deriding the efficacy of iswrune()
>>> >> > only that it has no bearing on any posix compliant utility
>>> >
>>> >> OK... here is the question which bothers me:
>>> >> tr -C does require to sort characters, right ? How do we sort
>>> >> characters which do not have an assigned meaning ?
>>> >
>>> > strcoll()
>>> >
>>> >> > if anyone wants to start a discussion about new utility option(s)
>>> >> > that rely on iswrune() and what ast utilities should be affected, great
>>> >> >
>>> >> > for systems that do not supply iswrune() portability remains a big 
>>> >> > issue,
>>> >> > current practice notwithstanding -- it will always be an
>>> >> > iffe|config game of catchup vs. the iw*() collection du jour
>>> >
>>> >> BTW: re |iswrune()| emulation... perl has the perl regex match
>>> >> \p{Unassigned} ... which creates the same matches as this script
>>> >> (assuming LC_ALL='en_US.UTF-8' and locales Unicode version matches the
>>> >> perl unicode version):
>>> >> -- snip --
>>> >> set -o nounset
>>> >
>>> >> typeset -i16 i
>>> >
>>> >> for (( i=0 ; i < 0x10FFFF ; i++ )) ; do
>>> >>       ch="${ printf "\u[${i/~(El)16#/}]" ; }"
>>> >
>>> >>       if [[ "$ch" !=
>>> >> ~(Elr)[[:alpha:][:alnum:][:digit:][:print:][:cntrl:][:space:][:blank:][:punct:]]
>>> >> ]] ; then
>>> >>               printf "# match found: %q\n" "${i}"
>>> >>       fi
>>> >> done
>>> >
>>> >> print '# done.'
>>> >> -- snip --
>>> >
>>> >> |iswrune()| or not... IMO it would be nice to have something like
>>> >> \p{Unassigned} in normal egrep/xgrep regex, e.g. something like a
>>> >> [:_unassigned:] character class...
>>> >
>>> > [:rune:] would be a fine name for that class
>>
>>> There's still no [:rune:] emulation in libast :(
>>
>> that looks simple enough
>> but I'm not convinced its correct
>> what about system and user defined classes
>> (there are notes on the list about some for chinese characters -- I forget 
>> the details)
>
> Maybe Roland can elaborate. He's an expert for such locales.
>
>> if those aren't handled then why provide a [:rune:] that might work maybe
>
> Chinese and Japanese locales have extra classes defined by the locale
> data, but they are *always* "extra", i.e. the characters have matches
> in the basic POSIX character classes but also match extra classes like
> isphonogram() or is ideogram().
>
> Please, could we get [:rune:] and a --weed-out-non-runes option for
> tr(1), please?
>

Please?

Ced
-- 
Cedric Blancher <[email protected]>
Institute Pasteur
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to