Hello Benno,

thanks for your quick reply.

On Sun, Jul 12, 2020 at 13:15:41 +0200, Benno Schulenberg wrote:
> Ideally, a UTF-8 file should not contain a Byte Order Mark.  What if
> I concatenate several files together?  Then the result might contain
> BOMs embedded in the text.
> 
> As far as I know, BOM is only a problem with Windows and Google files.
> I do not know of any tool on Unix that adds a BOM to a UTF-8 file.

I agree, that a UTF8-BOM is usually not necessary. Probably because of
the mentioned compatibility reasons on Windows 'aegisub' does always 
include a BOM when saving as UTF-8 (concatenating two valid ASS files 
wouldn't produce a new valid ASS file anyway).


> […]  And the Unicode standard
> does not forbid the BOM from occurring elsewhere -- in that case
> it should be considered as a Zero Width Non Breaking Space.

Thanks for pointing it out, I wasn't aware of this. In that case it is
probably just good practice(?) to have a BOM only at the beginning.


> I could mitigate the problem by placing the cursor after the BOM
> when a file is opened.  (See attached patch.)  But you can still
> delete the BOM with <Backspace>, or put the cursor on it with
> <Left> or <Home>.  For nano, all characters are just a group of
> bytes that can be added, deleted, restored, searched, and saved.
> 
> If I would make the BOM uneditable and unmovable, people could
> no longer use nano to get rid of a BOM in a file.
> 
>   https://bugs.launchpad.net/ubuntu/+source/nano/+bug/1045062

With the explanation having nano's current behaviour seems like a valid 
approach, though there's a chance a user, who isn't aware of it, might move or 
delete the BOM by accident. The patch would make this less likely (but it 
would still be possible).
FWIW I would have expected leading BOM/NoBOM to be an option when saving with 
^O (like DOS/Mac-Format) and by default keep status quo.  (looking at other 
editors, vim has this as :set bomb / :set nobomb ).
Or for the BOM to be visible. Though being visible would contradict the 
interpretation as a zero-width-nb-space, so maybe not.

Nils

Reply via email to