[Edbrowse-dev] BOM

Karl Dahlke Fri, 13 Nov 2015 23:37:36 -0800

The Windows port has raised the issue of the byte order mark,
which is prevalent in windows files, but virtually nonexistent in unix.
If we do choose to support this, I would read the BOM,
convert the file to utf8 for internal use, then convert it back with its BOM
if that file or any portion of it was written to disk.
There is a precedent for this.
An iso8859 file is converted to utf8, then converted back upon write.
Try it and see.
But only iso8859-1, and even this we may not support for long,
as unix / linux is almost 100% utf8 at this point.
Anyway there is some machinery in place.


The real key for me is the search and substitute commands.
These are under control of pcre, which runs in utf8 mode.
/ni.o/ will match niño, with the dot matching
the 2 byte utf8 char n tilde.
So if everything is utf8 inside then all the searches and substitutes
will work the way our international users would want and expect.

This is thinking ahead, I don't expect to implement BOM tomorrow.

Karl Dahlke
_______________________________________________
Edbrowse-dev mailing list
[email protected]
http://lists.the-brannons.com/mailman/listinfo/edbrowse-dev

[Edbrowse-dev] BOM

Reply via email to