Re: UTF-8 BOM

Allan Odgaard Sat, 27 Oct 2007 06:23:56 -0700

On 27/10/2007, at 14:55, Michel Fortin wrote:

[...]
Now, the interesting question is: what should PHP Markdown (or anyMarkdown implementation for that matter) do with the UTF-8 BOM? Hereare three options:
1. Remove it?
2. Keep it at the start of the text?
3. Ignore it (as it does now)?

Option 3 seems a logical option to me


Yes, ignore it!

[...]
Between option 1 and 2, surely option 1 (dropping the BOM) is thebest. Otherwise it'd be hard to concatenate the output with atemplate HTML document.

And that is why the user should not have placed the BOM in an UTF-8file in the first place ;)

UTF-8 is an ASCII superset that makes 99% of existing programs thatdeal with ASCII work flawlessly with the text. Add the BOM and youbreak that, i.e. using ‘cat’ to concatenate files will result in BOMsin the middle of the result, use ‘grep’ to extract stuff, and you mayor may not get a BOM in the result, use a shebang line and find theshell (execv()) won’t actually read it, save your C source with a BOMand gcc will choke on it, etc.

The BOM is a byte-order-marker for UTF-16, it has no place in UTF-8.Some may argue it is there to indicate that the file is UTF-8, butUTF-8 can already be recognized with >99% certainty w/o the BOM, sothe BOM doesn’t really help here, and when text is sent over the wire,there generally is a specified default encoding and a way to changethat, which does not include adding garbage to the start of the file(and to the best of my knowledge no standard calls for the examinationof the first 3 bytes to determine encoding).

[...]
UTF-8 BOM handling sounds like a good thing to add to MDTest too.

I’d say no -- on the contrary, if the user adds a BOM to his UTF-8file he should be told that this is a bad idea. Fortunately none ofthe text editors on my system even has this option ;)


_______________________________________________
Markdown-Discuss mailing list
[email protected]
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Re: UTF-8 BOM

Reply via email to