James Allwright writes: | The problem is that there is no standard for the line separator in text | files (or to be more precise, different OS's use different standards). | The way round this is to allow any of the following as the separator
On the contrary, if you use the acronyms ASCII or ANSI, there is a very precise standard, and it has been well-defined for decades. It isn't the least bit ambiguous on the topic of line separators. A "standard" is what is defined by a standards organization, in this case a US government committee back in the 1960's. A commercial product may do something different, but if so, it is not "standard". The characters symbolized by RS (Record Separator, hex 1E) and LF (Line Feed, hex 0A) are the only legal ASCII record separators. CR (hex 0D) is, of course, a legal character, but it is not a legal line separator. Its proper function is to indicate overstrikes. Software that uses CR as part of a line separator is in blatant violation of the ASCII/ANSI standard. Which doesn't stop them from doing it, of course. You can use any character encoding you please, including one that you made up yourself. But you shouldn't call it "standard" unless it actually follows the legal standard. The really funny thing about all this is that RS was originally the primary record separator. The ASCII committee grudgingly agreed to allow LF to be used, on the grounds that a lot of terminal devices didn't produce RS from their keyboards. (Just try finding any software anywhere that uses RS. ;-) | <lf> | <cr> | <lf><cr> | <cr><lf> All four of these have been used on various computer systems. A lot of linux software now notes what is in the input, and tries to produce the same in the output. This can be a bit tricky, especially if the data has been passed around the Net and has a mixture of line separators. But it's probably worth doing if you want your program to be usable anywhere. This was one of the messier thing that I found in writing my ABC Tune Finder. Some software does very, uh, "interesting" things with line separators. I finally threw up my hands, and had the code convert everything to the ANSI standard, with just LF as the terminator. (I did briefly contemplate using RS ... ;-) There are other standards that define other line separators. For example, in HTML neither CR nor LF (nor RS) is a line separator. HTML uses "<BR>" for that purpose. Where things get really fun is the web servers that send .abc files marked as text/html, although they are actually just plain ASCII text. Grrr ... To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html
