bug#11187: fmt: doesn't understand multibyte characters (UTF-8)
On 2026-03-05 16:56, Dmitry E. Oboukhov wrote: Step out of your shell of snobbery for once. The invective is getting stale sooner than I would have liked. Oh, well. It was a bit of fun while it was fresh. This is a forum for bug reports, and in hindsight I shouldn't have indulged in replying to the troll, as it's been wasting both of our time as well as the time of your chatbot. Anyway, bug report received. Again. Feel free to have the last word.
bug#11187: fmt: doesn't understand multibyte characters (UTF-8)
> In the United States violating copyright is a civil offense... Paul > is very aware of this, given that Astrolabe filed a lawsuit against > him Pffft! A world without struggle, a perfectly groomed world where food appears whenever you want it, would be boring, dull, and stagnant. In such a world, the only thing left for a human would be to die. We should actually envy Paul — he was entertained by such wonderful characters as Astrolabe. Ha-ha! If we are to be a bit more serious: I hope you have heard of the experiments on mice where the need to struggle for survival was removed? They all died [1]! Perhaps your 7-bit stagnation is a symptom of the same "behavioral sink."? > Should I also ask someone with delusions of grandeur who they think > God is? We live in fascinating times! Today we have AI, and you can ask it about God — any of the known ones, in fact. What’s even better is that together with the AI, you could invent a *new* god. A Seven-Bit God, perhaps? Worshiping such a deity would allow you to officially reject any Unicode patches. It’s not a bug; it’s a Taboo! It is truly a pity that while the rest of the world is evolving, you choose to hide behind old lawsuits and "divine" ASCII limitations. Best regards, A herald of the Crystalline Era. 1. https://en.wikipedia.org/wiki/John_B._Calhoun#Mouse_experiments signature.asc Description: PGP signature
bug#11187: fmt: doesn't understand multibyte characters (UTF-8)
"Dmitry E. Oboukhov" writes: >> What, and introduce copyright violations left and right... > > Step out of your shell of snobbery for once. In the United States violating copyright is a civil offense, and can be a criminal offense. Paul is very aware of this, given that Astrolabe filed a lawsuit against him for his work on Time Zone Database [1]. > Don’t be afraid—ask the AI: "Who will own the rights to the --unicode > option if you write it?" You might be surprised by the answer! Should I also ask someone with delusions of grandeur who they think God is? Collin [1] https://www.eff.org/press/releases/eff-demands-withdrawal-bogus-time-zone-database-lawsuit
bug#11187: fmt: doesn't understand multibyte characters (UTF-8)
To: [email protected] Cc: [email protected] Subject: Re: bug#11187: fmt: doesn't understand multibyte characters (UTF-8) > What, and introduce copyright violations left and right... Step out of your shell of snobbery for once. Don’t be afraid—ask the AI: "Who will own the rights to the --unicode option if you write it?" You might be surprised by the answer! P.S. Yes, AI will soon take your job, and perhaps even your life (ha-ha), but rest assured, your precious copyrights will remain yours until the very end. > ...you vented frustration to the maintainers... not on the AI When I see a bug, I report it. Sometimes I even provide patches. But here, I saw that a patch was already prepared fourteen years ago. What else was I supposed to do but vent my regret at the snobs who refused to accept it? And no, don't hide behind the "obsolescence" of fmt. Two or five years ago, you might have had a point. But now, you have a new, massive user base: AI agents. This user is more than capable of creating that patch for you. Ask it—overcome your fear. Perhaps you’ll even manage to commit it before World War III? Who knows, maybe it will count for something during the negotiations after humanity’s surrender to the Crystalline Intelligence... They might remember your one kind act and take pity on the poor "carbon-based" humans. Best regards, A herald of the new era. signature.asc Description: PGP signature
bug#11187: fmt: doesn't understand multibyte characters (UTF-8)
On 2026-03-05 06:09, Dmitry E. Oboukhov wrote: I wasn't "naively trusting" the AI; I was testing it. Fine, but after the AI wasted your time, you vented frustration to the maintainers of the backward-compatibility tool it misused, not on the AI that caused your problem. Okay perhaps you should ask it for advice: "How do I add a --unicode option to 'fmt'?" What, and introduce copyright violations left and right, into an app that is present only for backward compatibility and that nobody should use for new stuff? Oh, *that* sounds like a good use of my time! Instead, perhaps you should tell the AI assistant "Don't use fmt except in backward compatbility mode." If it's smart it could then propagate that advice to everybody else who uses the assistant. That would be a better way to improve things.
bug#11187: fmt: doesn't understand multibyte characters (UTF-8)
On 12:49 Wed 04 Mar , Paul Eggert wrote: > On 2026-03-04 11:58, Dmitry E. Oboukhov wrote: > > > If 'fmt' is a "useless program" that only exists because it's a > > "hassle to remove," then why is it still being shipped in 2025/2026 > > For backwards compability of course. Removing it would be more trouble than > keeping it. > > Bringing AIs into this is a red herring. If you're naively trusting a dumb > AI to do your work, what do you expect? I wasn't "naively trusting" the AI; I was testing it. And through that test, I discovered that Coreutils is still mentally living in 1970. My only real "naive trust" was in Coreutils itself. It was a genuine shock to find such a decaying state of affairs in what is supposed to be the bedrock of Unix-like systems. As it turns out, fourteen years ago there were people who wanted to fix this. Alas, the snobbery of the maintainers didn't allow it. You can sneer at AI all you want, but AI is now your most active user. Everyone else has either moved to IDEs or uses their own scripts (like I do). So, instead of attacking the AI, perhaps you should ask it for advice: "How do I add a --unicode option to 'fmt'?" I am certain it could provide a patch that guarantees backward compatibility without breaking your precious LC_ALL=C performance. Try it! Don't sit in your shell like a hermit. The world moved on to multibyte characters decades ago; it's time for 'fmt' to join us. Best regards, A user who expected more from "2025" software. signature.asc Description: PGP signature
bug#11187: fmt: doesn't understand multibyte characters (UTF-8)
On 2026-03-04 11:58, Dmitry E. Oboukhov wrote: If 'fmt' is a "useless program" that only exists because it's a "hassle to remove," then why is it still being shipped in 2025/2026 For backwards compability of course. Removing it would be more trouble than keeping it. Bringing AIs into this is a red herring. If you're naively trusting a dumb AI to do your work, what do you expect?
bug#11187: fmt: doesn't understand multibyte characters (UTF-8)
> Nowadays of course there's no real reason to use for fmt, as text > editors now do that stuff for you. That is where you are mistaken. You are thinking of a human sitting in front of a modern IDE. But in 2026, the most active "users" of classic CLI utilities like grep, sed, awk, and fmt are **AIs**. The reason AI "advertises" fmt is because it expects a standard system utility to perform its documented function. In fact, that is exactly how I discovered this piece of "legacy" junk. I tasked an AI with a formatting job, saw a mangled, broken output, and realized the AI had naively trusted 'fmt' to handle 21st-century text. When an AI agent processes data via a shell, it doesn't have an IDE. It relies on the core building blocks of the system. If those blocks are broken or stuck in the 1970s, the AI's output becomes a mess for the human recipient. If 'fmt' is a "useless program" that only exists because it's a "hassle to remove," then why is it still being shipped in 2025/2026 as a functional part of GNU Coreutils? Keeping a broken tool in the box just because "it's a hassle to take it out" is exactly the kind of technical debt that makes modern systems feel like they are built on quicksand. If it's a corpse, bury it. If it's a tool, fix its 14-year-old Unicode blindness. Best regards, A human who has to clean up after 'fmt' and its 7-bit dreams. signature.asc Description: PGP signature
bug#11187: fmt: doesn't understand multibyte characters (UTF-8)
On 2026-03-03 12:10, Dmitry E. Oboukhov wrote: No one, ever, will use a 1970s line-wrapper to process petabytes No one should use a 1970s line-wrapper, period. Which is why its maintenance languishes. I daresay it still exists primarily because it'd be a hassle to remove it. And I say that as someone who actually *used* fmt routinely, five or so decades ago, as a sort of plugin for vi, though even then when I wanted serious formatting I used troff. Nowadays of course there's no real reason to use for fmt, as text editors now do that stuff for you. Although I enjoyed the rants, such talents for invective are wasted on useless programs like fmt. If you can't supply a good patch for fmt, please at least use those talents on a more worthy target. cp, say.
bug#11187: fmt: doesn't understand multibyte characters (UTF-8)
> Instead of being snarky on mailing lists, I encourage you to have a > look at writing a patch. Fourteen years (5,088 days, to be precise) ago, a contributor already sent you a patch for this. You rejected it. Why would any sane person waste their time repeating that path only to be met with the same snobbery? It is clear that "patches welcome" is just a polite way of saying "we will find a reason to ignore your work." > performs well with LC_ALL=C I’ll let you in on a "future" secret: Performance for 'fmt' does not matter. At all. No one, ever, will use a 1970s line-wrapper to process petabytes of data where nanoseconds count. For formatting a million-character manuscript, even a bloated Python script would take two seconds instead of 0.1s. Nobody cares. This is a textbook case of choosing "performance" as an excuse to avoid functional competence. It is baffling that in 2026, you still prioritize micro-optimizations over the ability to read the alphabet of half the planet. Stay in your 7-bit world if you must. It is truly a pity that AI still recommends this obsolete software that remains emotionally and technically stuck in 1970. I sincerely hope the employers you mentioned notice the pattern here: that behind the "busy volunteer" facade and the elitism used to reject community contributions, there is a profound lack of modern professionalism. Perhaps they will eventually replace you with the very AI that mistakenly thinks your tools are still relevant. P.S. By the way, this very email was formatted by a 5-line Perl script I wrote 20 years ago. It handles UTF-8, multi-byte characters, and quote prefixes perfectly. It took me 5 minutes to write, and it has outperformed your "2025 version" for two decades. Ha-ha. Best regards, Someone who values tools that actually work. signature.asc Description: PGP signature
bug#11187: fmt: doesn't understand multibyte characters (UTF-8)
"Dmitry E. Oboukhov" writes: > Greetings from the year 2026! > > I am writing to celebrate a truly historic milestone. I just checked > the 'fmt' utility in my fresh Debian system (version 9.7, copyright > 2025), and I am thrilled to see that this bug is still alive and well > after fourteen years of dedicated neglect. > > It is a rare feat in the software industry to maintain such consistent > incompetence. Your man page proudly displays "2025," yet the code > remains a pristine monument to the 1970s, incapable of understanding > that a single character might occupy more than one byte. > > Even an AI (which, by the way, is how many people discover your > "future-dated" tools now) can count characters better than 'fmt'. Is > the plan to wait until the 20th anniversary of this bug before you > consider using mbrtowc()? Or is the "GNU way" simply to ignore every > language on Earth that isn't English until the heat death of the > universe? > > I tried formatting Greek text (2 bytes per char), and 'fmt' broke the > lines exactly twice as often as it should. It's almost poetic: a tool > from the birthplace of modern logic being mangled by a tool that > refuses to use any. > > Please, don't fix it now. At this point, it’s not a bug—it’s a > heritage site. I look forward to checking in again in 2030 to see if > you've managed to reach the 8-bit era. > > Keep up the "stunning" progress. > Best regards, > > A user with a calendar and a multibyte keyboard. Multi-byte character support has gradually been improving in coreutils recently. Regarding your "dedicated neglect" and "consistent incompetence" comments, note that all of the coreutils maintainers are employed to work on things other than coreutils. Perhaps you choose to work the rest of your waking hours, but that is quite rare of most people. Instead of being snarky on mailing lists, I encourage you to have a look at writing a patch. I will be happy to review it. Please make sure it handles incomplete and invalid multi-byte characters, includes tests, and performs well with LC_ALL=C. Thanks, Collin
bug#11187: fmt: doesn't understand multibyte characters (UTF-8)
Greetings from the year 2026! I am writing to celebrate a truly historic milestone. I just checked the 'fmt' utility in my fresh Debian system (version 9.7, copyright 2025), and I am thrilled to see that this bug is still alive and well after fourteen years of dedicated neglect. It is a rare feat in the software industry to maintain such consistent incompetence. Your man page proudly displays "2025," yet the code remains a pristine monument to the 1970s, incapable of understanding that a single character might occupy more than one byte. Even an AI (which, by the way, is how many people discover your "future-dated" tools now) can count characters better than 'fmt'. Is the plan to wait until the 20th anniversary of this bug before you consider using mbrtowc()? Or is the "GNU way" simply to ignore every language on Earth that isn't English until the heat death of the universe? I tried formatting Greek text (2 bytes per char), and 'fmt' broke the lines exactly twice as often as it should. It's almost poetic: a tool from the birthplace of modern logic being mangled by a tool that refuses to use any. Please, don't fix it now. At this point, it’s not a bug—it’s a heritage site. I look forward to checking in again in 2030 to see if you've managed to reach the 8-bit era. Keep up the "stunning" progress. Best regards, A user with a calendar and a multibyte keyboard. signature.asc Description: PGP signature
