bug#11187: fmt: doesn't understand multibyte characters (UTF-8)

2026-03-06 Thread Paul Eggert

On 2026-03-05 16:56, Dmitry E. Oboukhov wrote:

Step out of your shell of snobbery for once.


The invective is getting stale sooner than I would have liked. Oh, well. 
It was a bit of fun while it was fresh.


This is a forum for bug reports, and in hindsight I shouldn't have 
indulged in replying to the troll, as it's been wasting both of our time 
as well as the time of your chatbot.


Anyway, bug report received. Again. Feel free to have the last word.





bug#11187: fmt: doesn't understand multibyte characters (UTF-8)

2026-03-05 Thread Dmitry E. Oboukhov

> In the United States violating copyright is a civil offense... Paul
> is very aware of this, given that Astrolabe filed a lawsuit against
> him

Pffft! A world without struggle, a perfectly groomed world where food
appears whenever you want it, would be boring, dull, and stagnant. In
such a world, the only thing left for a human would be to die. 

We should actually envy Paul — he was entertained by such wonderful
characters as Astrolabe. Ha-ha!

If we are to be a bit more serious: I hope you have heard of the
experiments on mice where the need to struggle for survival was
removed? They all died [1]! Perhaps your 7-bit stagnation is a symptom
of the same "behavioral sink."?

> Should I also ask someone with delusions of grandeur who they think
> God is?

We live in fascinating times! Today we have AI, and you can ask it
about God — any of the known ones, in fact. What’s even better is that
together with the AI, you could invent a *new* god. A Seven-Bit God,
perhaps? 

Worshiping such a deity would allow you to officially reject any
Unicode patches. It’s not a bug; it’s a Taboo! 

It is truly a pity that while the rest of the world is evolving, you
choose to hide behind old lawsuits and "divine" ASCII limitations. 

Best regards,
A herald of the Crystalline Era.

1. https://en.wikipedia.org/wiki/John_B._Calhoun#Mouse_experiments



signature.asc
Description: PGP signature


bug#11187: fmt: doesn't understand multibyte characters (UTF-8)

2026-03-05 Thread Collin Funk
"Dmitry E. Oboukhov"  writes:

>> What, and introduce copyright violations left and right...
>
> Step out of your shell of snobbery for once.

In the United States violating copyright is a civil offense, and can be
a criminal offense.

Paul is very aware of this, given that Astrolabe filed a lawsuit against
him for his work on Time Zone Database [1].

> Don’t be afraid—ask the AI: "Who will own the rights to the --unicode
> option if you write it?"  You might be surprised by the answer!

Should I also ask someone with delusions of grandeur who they think God
is?

Collin

[1] 
https://www.eff.org/press/releases/eff-demands-withdrawal-bogus-time-zone-database-lawsuit





bug#11187: fmt: doesn't understand multibyte characters (UTF-8)

2026-03-05 Thread Dmitry E. Oboukhov
To: [email protected]
Cc: [email protected]
Subject: Re: bug#11187: fmt: doesn't understand multibyte characters (UTF-8)

> What, and introduce copyright violations left and right...

Step out of your shell of snobbery for once. Don’t be afraid—ask the
AI: "Who will own the rights to the --unicode option if you write it?"
You might be surprised by the answer! 

P.S. Yes, AI will soon take your job, and perhaps even your life
(ha-ha), but rest assured, your precious copyrights will remain yours
until the very end.

> ...you vented frustration to the maintainers... not on the AI

When I see a bug, I report it. Sometimes I even provide patches. But
here, I saw that a patch was already prepared fourteen years ago. What
else was I supposed to do but vent my regret at the snobs who refused
to accept it?

And no, don't hide behind the "obsolescence" of fmt. Two or five years
ago, you might have had a point. But now, you have a new, massive user
base: AI agents. This user is more than capable of creating that patch
for you. Ask it—overcome your fear. Perhaps you’ll even manage to
commit it before World War III? 

Who knows, maybe it will count for something during the negotiations
after humanity’s surrender to the Crystalline Intelligence... They
might remember your one kind act and take pity on the poor
"carbon-based" humans.

Best regards,
A herald of the new era.



signature.asc
Description: PGP signature


bug#11187: fmt: doesn't understand multibyte characters (UTF-8)

2026-03-05 Thread Paul Eggert

On 2026-03-05 06:09, Dmitry E. Oboukhov wrote:


I wasn't "naively trusting" the AI; I was testing it.


Fine, but after the AI wasted your time, you vented frustration to the 
maintainers of the backward-compatibility tool it misused, not on the AI 
that caused your problem. Okay



perhaps you should ask it for
advice: "How do I add a --unicode option to 'fmt'?"


What, and introduce copyright violations left and right, into an app 
that is present only for backward compatibility and that nobody should 
use for new stuff? Oh, *that* sounds like a good use of my time!


Instead, perhaps you should tell the AI assistant "Don't use fmt except 
in backward compatbility mode." If it's smart it could then propagate 
that advice to everybody else who uses the assistant. That would be a 
better way to improve things.






bug#11187: fmt: doesn't understand multibyte characters (UTF-8)

2026-03-05 Thread Dmitry E. Oboukhov
On 12:49 Wed 04 Mar , Paul Eggert wrote:
> On 2026-03-04 11:58, Dmitry E. Oboukhov wrote:
> 
> > If 'fmt' is a "useless program" that only exists because it's a
> > "hassle to remove," then why is it still being shipped in 2025/2026
> 
> For backwards compability of course. Removing it would be more trouble than
> keeping it.
> 
> Bringing AIs into this is a red herring. If you're naively trusting a dumb
> AI to do your work, what do you expect?

I wasn't "naively trusting" the AI; I was testing it. And through that
test, I discovered that Coreutils is still mentally living in 1970. My
only real "naive trust" was in Coreutils itself. It was a genuine
shock to find such a decaying state of affairs in what is supposed to
be the bedrock of Unix-like systems. 

As it turns out, fourteen years ago there were people who wanted to
fix this. Alas, the snobbery of the maintainers didn't allow it.

You can sneer at AI all you want, but AI is now your most active user.
Everyone else has either moved to IDEs or uses their own scripts (like
I do). So, instead of attacking the AI, perhaps you should ask it for
advice: "How do I add a --unicode option to 'fmt'?" 

I am certain it could provide a patch that guarantees backward
compatibility without breaking your precious LC_ALL=C performance. Try
it! Don't sit in your shell like a hermit. The world moved on to
multibyte characters decades ago; it's time for 'fmt' to join us.

Best regards,
A user who expected more from "2025" software.


signature.asc
Description: PGP signature


bug#11187: fmt: doesn't understand multibyte characters (UTF-8)

2026-03-04 Thread Paul Eggert

On 2026-03-04 11:58, Dmitry E. Oboukhov wrote:


If 'fmt' is a "useless program" that only exists because it's a
"hassle to remove," then why is it still being shipped in 2025/2026


For backwards compability of course. Removing it would be more trouble 
than keeping it.


Bringing AIs into this is a red herring. If you're naively trusting a 
dumb AI to do your work, what do you expect?






bug#11187: fmt: doesn't understand multibyte characters (UTF-8)

2026-03-04 Thread Dmitry E. Oboukhov
> Nowadays of course there's no real reason to use for fmt, as text
> editors now do that stuff for you.

That is where you are mistaken. You are thinking of a human sitting in
front of a modern IDE. But in 2026, the most active "users" of classic
CLI utilities like grep, sed, awk, and fmt are **AIs**.

The reason AI "advertises" fmt is because it expects a standard system
utility to perform its documented function. In fact, that is exactly
how I discovered this piece of "legacy" junk. I tasked an AI with a
formatting job, saw a mangled, broken output, and realized the AI had
naively trusted 'fmt' to handle 21st-century text.

When an AI agent processes data via a shell, it doesn't have an IDE.
It relies on the core building blocks of the system. If those blocks
are broken or stuck in the 1970s, the AI's output becomes a mess for
the human recipient.

If 'fmt' is a "useless program" that only exists because it's a
"hassle to remove," then why is it still being shipped in 2025/2026 as
a functional part of GNU Coreutils? Keeping a broken tool in the box
just because "it's a hassle to take it out" is exactly the kind of
technical debt that makes modern systems feel like they are built on
quicksand.

If it's a corpse, bury it. If it's a tool, fix its 14-year-old Unicode
blindness.

Best regards,
A human who has to clean up after 'fmt' and its 7-bit dreams.



signature.asc
Description: PGP signature


bug#11187: fmt: doesn't understand multibyte characters (UTF-8)

2026-03-04 Thread Paul Eggert

On 2026-03-03 12:10, Dmitry E. Oboukhov wrote:

No one, ever, will use a 1970s line-wrapper to process
petabytes


No one should use a 1970s line-wrapper, period. Which is why its 
maintenance languishes. I daresay it still exists primarily because it'd 
be a hassle to remove it. And I say that as someone who actually *used* 
fmt routinely, five or so decades ago, as a sort of plugin for vi, 
though even then when I wanted serious formatting I used troff. Nowadays 
of course there's no real reason to use for fmt, as text editors now do 
that stuff for you.


Although I enjoyed the rants, such talents for invective are wasted on 
useless programs like fmt. If you can't supply a good patch for fmt, 
please at least use those talents on a more worthy target. cp, say.






bug#11187: fmt: doesn't understand multibyte characters (UTF-8)

2026-03-03 Thread Dmitry E. Oboukhov
> Instead of being snarky on mailing lists, I encourage you to have a
> look at writing a patch.

Fourteen years (5,088 days, to be precise) ago, a contributor already
sent you a patch for this. You rejected it. Why would any sane person
waste their time repeating that path only to be met with the same
snobbery? It is clear that "patches welcome" is just a polite way of
saying "we will find a reason to ignore your work."


> performs well with LC_ALL=C

I’ll let you in on a "future" secret: Performance for 'fmt' does not
matter. At all. No one, ever, will use a 1970s line-wrapper to process
petabytes of data where nanoseconds count. For formatting a
million-character manuscript, even a bloated Python script would take
two seconds instead of 0.1s. Nobody cares. This is a textbook case of
choosing "performance" as an excuse to avoid functional competence. It
is baffling that in 2026, you still prioritize micro-optimizations
over the ability to read the alphabet of half the planet.

Stay in your 7-bit world if you must. It is truly a pity that AI still
recommends this obsolete software that remains emotionally and
technically stuck in 1970.

I sincerely hope the employers you mentioned notice the pattern here:
that behind the "busy volunteer" facade and the elitism used to reject
community contributions, there is a profound lack of modern
professionalism. Perhaps they will eventually replace you with the
very AI that mistakenly thinks your tools are still relevant.

P.S. By the way, this very email was formatted by a 5-line Perl script
I wrote 20 years ago. It handles UTF-8, multi-byte characters, and
quote prefixes perfectly. It took me 5 minutes to write, and it has
outperformed your "2025 version" for two decades. Ha-ha.

Best regards,
Someone who values tools that actually work.


signature.asc
Description: PGP signature


bug#11187: fmt: doesn't understand multibyte characters (UTF-8)

2026-03-03 Thread Collin Funk
"Dmitry E. Oboukhov"  writes:

> Greetings from the year 2026!
>
> I am writing to celebrate a truly historic milestone. I just checked
> the 'fmt' utility in my fresh Debian system (version 9.7, copyright
> 2025), and I am thrilled to see that this bug is still alive and well
> after fourteen years of dedicated neglect.
>
> It is a rare feat in the software industry to maintain such consistent
> incompetence. Your man page proudly displays "2025," yet the code
> remains a pristine monument to the 1970s, incapable of understanding
> that a single character might occupy more than one byte.
>
> Even an AI (which, by the way, is how many people discover your
> "future-dated" tools now) can count characters better than 'fmt'. Is
> the plan to wait until the 20th anniversary of this bug before you
> consider using mbrtowc()? Or is the "GNU way" simply to ignore every
> language on Earth that isn't English until the heat death of the
> universe?
>
> I tried formatting Greek text (2 bytes per char), and 'fmt' broke the
> lines exactly twice as often as it should. It's almost poetic: a tool
> from the birthplace of modern logic being mangled by a tool that
> refuses to use any.
>
> Please, don't fix it now. At this point, it’s not a bug—it’s a
> heritage site. I look forward to checking in again in 2030 to see if
> you've managed to reach the 8-bit era.
>
> Keep up the "stunning" progress.
> Best regards,
>
> A user with a calendar and a multibyte keyboard.

Multi-byte character support has gradually been improving in coreutils
recently.

Regarding your "dedicated neglect" and "consistent incompetence"
comments, note that all of the coreutils maintainers are employed to
work on things other than coreutils. Perhaps you choose to work the rest
of your waking hours, but that is quite rare of most people.

Instead of being snarky on mailing lists, I encourage you to have a look
at writing a patch. I will be happy to review it. Please make sure it
handles incomplete and invalid multi-byte characters, includes tests,
and performs well with LC_ALL=C.

Thanks,
Collin





bug#11187: fmt: doesn't understand multibyte characters (UTF-8)

2026-03-03 Thread Dmitry E. Oboukhov
Greetings from the year 2026!

I am writing to celebrate a truly historic milestone. I just checked
the 'fmt' utility in my fresh Debian system (version 9.7, copyright
2025), and I am thrilled to see that this bug is still alive and well
after fourteen years of dedicated neglect.

It is a rare feat in the software industry to maintain such consistent
incompetence. Your man page proudly displays "2025," yet the code
remains a pristine monument to the 1970s, incapable of understanding
that a single character might occupy more than one byte.

Even an AI (which, by the way, is how many people discover your
"future-dated" tools now) can count characters better than 'fmt'. Is
the plan to wait until the 20th anniversary of this bug before you
consider using mbrtowc()? Or is the "GNU way" simply to ignore every
language on Earth that isn't English until the heat death of the
universe?

I tried formatting Greek text (2 bytes per char), and 'fmt' broke the
lines exactly twice as often as it should. It's almost poetic: a tool
from the birthplace of modern logic being mangled by a tool that
refuses to use any.

Please, don't fix it now. At this point, it’s not a bug—it’s a
heritage site. I look forward to checking in again in 2030 to see if
you've managed to reach the 8-bit era.

Keep up the "stunning" progress.
Best regards,

A user with a calendar and a multibyte keyboard.



signature.asc
Description: PGP signature