Op 12-07-2020 om 16:26 schreef Nils König:
> I agree, that a UTF8-BOM is usually not necessary.

In UTF-8 a BOM is never necessary.  It's just that the annoying
fools at Microsoft have made their software so that things don't
work when there is *no* BOM.  Christ... how stupid can they be?

> With the explanation having nano's current behaviour seems like a valid 
> approach, though there's a chance a user, who isn't aware of it, might move
> or delete the BOM by accident.

Again, the dependency on a BOM in a UTF-8 file is an idiocy of
Microsoft and other Windows software.

> FWIW I would have expected leading BOM/NoBOM to be an option when saving with 
> ^O (like DOS/Mac-Format) and by default keep status quo.

No-no-no, horrible!  The user ought not to be aware of the presence
of a BOM.  Software that accepts UTF-8 ought not to require a BOM.

Nano is a simple editor, a Unix editor  It is meant for editing emails,
configuration files, shell scripts, and other plain text files.  There
are never any BOMs there.  And now, because some people want to use nano
to edit files with a silly required format, nano must adapt and treat a
BOM as a sacred trio of bytes?

If it were a simple change, a few lines in one or two places, I could
do that.  But... it is complicated: <Left>, <Home>, typing, pasting,
backspacing, ... they are all affected.  And all of them must treat
that trio as sacred.  It's annoying.

> Or for the BOM to be visible. Though being visible would contradict the 
> interpretation as a zero-width-nb-space, so maybe not.

The Unicode standard says that the Byte Order Mark ought to be an
invisible character.  (Vim doesn't care about that and shows the BOM as
<feff> when in the middle of a file.)  Also: it would require breaking
up entirely the way nano displays lines.  I'm not willing to do that.

I've contemplated adding the attached patch, but then the user
could still backspace over the BOM or cut the line unawares.

If nano were to handle a BOM properly, it must remove a BOM whenever
a file is read, and add it back when it is written.  But that would
make it impossible to delete an unwanted BOM with a simple backspace.
Then the user would need to fall back to a tool like dos2unix.

Benno
diff --git a/src/nano.c b/src/nano.c
index 8e8b9952..db213857 100644
--- a/src/nano.c
+++ b/src/nano.c
@@ -1649,6 +1649,8 @@ void process_a_keystroke(void)
 #endif
 }
 
+#define byte(n)  (unsigned char)openfile->current->data[openfile->current_x + n]
+
 int main(int argc, char **argv)
 {
 	int stdin_flags, optchr;
@@ -2489,6 +2491,13 @@ int main(int argc, char **argv)
 		lastmessage = VACUUM;
 		as_an_at = TRUE;
 
+#if defined(ENABLE_UTF8) && !defined(NANO_TINY)
+		/* Tell the user when the cursor sits on a BOM. */
+		if (byte(0) == 0xEF && byte(1) == 0xBB && byte(2) == 0xBF) {
+			statusline(NOTICE, _("Byte Order Mark"));
+			beep();
+		}
+#endif
 		/* Refresh just the cursor position or the entire edit window. */
 		if (!refresh_needed) {
 			place_the_cursor();

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to