Bug#310495: par: Does not handle UTF-8 multibyte characters properly

Teemu Likonen Thu, 19 Jan 2006 00:48:08 -0800

Hello,

On Wednesday 18 January 2006 13:58, you wrote:
> I have not been able to find any program that does UTF-8 multibyte
> character left and right justification for text files.


I have not either, sorry.

> If you can 
> point me to some source where I can find information on how this can
> be handled then perhaps I can try to figure out a patch to fix this.

I'm not a programmer but I guess one just have to understand how UTF-8 
encoding works. "The old way" was to count strings byte by byte but 
it's not working anymore with UTF-8. Probably manual page UTF-8(7) is a 
good start and of course there are Unicode Consortium's official 
definitions:

http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf

> According to the release notes "par" is OK with 8-bit characters but
> not multibyte so this is not a bug in the program vis-a-vis its
> documentation. Would it be OK with you if this bug was downgraded to
> wishlist?

I don't mind moving it to the wishlist. I don't use "par" anymore - I 
can't. But, this is kind of becoming a bug because Linux distributions 
have moved towards UTF-8 locale and there aren't many languages that 
can be written with ascii codes 0 - $7f. In Unicode's UTF-8 encoding 
all the other codes ($80 - $10ffff) need 2 to 4 bytes. So, as "par" is 
mainly for reformatting text with human languages, it has become pretty 
useless nowadays as Unicode and UTF-8 has come.

> Thanks and regards,

Thank you too. A UTF-8 patch would be really nice.

 - TL

pgpB2ov7GLVfU.pgp
Description: PGP signature

Bug#310495: par: Does not handle UTF-8 multibyte characters properly

Reply via email to