Silas wrote:
> Bob Proulx wrote:
> >    iconv -f UTF-8 -t ASCII//TRANSLIT <filein >fileout
> 
> It seems it is not possible on NetBSD 9.0 iconv :-(

It looks like //TRANSLIT is a GNU glibc extension not available in
NetBSD's version of libc.  Sorry.

> $ echo 'pão' | iconv -f UTF-8 -t ASCII//TRANSLIT
> iconv: iconv_open(ASCII//TRANSLIT, UTF-8): Invalid argument

I can use iconv to translate from one codeset to another but it
doesn't know how to transliterate.  It's not listed in the
documentation for it.

    man iconv

     -t    Specifies the destination codeset name as to_name.

And that is all it says.  So can change codesets.

    $ echo 'pão' | iconv -f UTF-8 -t LATIN1 | od -tx1 -c
    0000000   70  e3  6f  0a                                                
      p 343   o  \n                                                

I passed the output through od to show the e3 of it in LATIN1 to avoid
the mismash of it here in what will be a UTF-8 mailing.  But I can
show that it can be converted back.

    $ echo 'pão' | iconv -f UTF-8 -t LATIN1 | iconv -f LATIN1 -t UTF-8
    pão

> Is there something that could be installed from pkgsrc (or another
> iconv implementation) to make it work?

For transliteration it looks like you would need the GNU version of
iconv.  Sorry!

    https://manpages.debian.org/buster/manpages/iconv.1.en.html

    -t to-encoding, --to-code=to-encoding
        Use to-encoding for output characters.

        If the string //IGNORE is appended to to-encoding, characters that
        cannot be converted are discarded and an error is printed after
        conversion.

        If the string //TRANSLIT is appended to to-encoding, characters
        being converted are transliterated when needed and possible. This
        means that when a character cannot be represented in the target
        character set, it can be approximated through one or several
        similar looking characters. Characters that are outside of the
        target character set and cannot be transliterated are replaced
        with a question mark (?) in the output.

Bob

Reply via email to