James Allwright writes:
| The problem is that there is no standard for the line separator in text
| files (or to be more precise, different OS's use different standards).
| The way round this is to allow any of the following as the separator

On the contrary, if you use the acronyms ASCII or ANSI,  there  is  a
very  precise standard, and it has been well-defined for decades.  It
isn't the least bit ambiguous on the topic of line separators.

A "standard" is what is defined by a standards organization, in  this
case  a  US  government  committee  back in the 1960's.  A commercial
product may do something different, but if so, it is not  "standard".
The  characters  symbolized  by  RS (Record Separator, hex 1E) and LF
(Line Feed, hex 0A) are the only legal ASCII record  separators.   CR
(hex 0D) is, of course, a legal character, but it is not a legal line
separator.  Its proper function is to indicate overstrikes.

Software that uses CR as part of  a  line  separator  is  in  blatant
violation  of  the ASCII/ANSI standard.  Which doesn't stop them from
doing it, of course.  You can use any character encoding you  please,
including  one  that you made up yourself.  But you shouldn't call it
"standard" unless it actually follows the legal standard.

The really funny thing about all this is that RS was  originally  the
primary  record  separator.  The ASCII committee grudgingly agreed to
allow LF to be used, on the grounds that a lot  of  terminal  devices
didn't  produce  RS  from  their  keyboards.   (Just  try finding any
software anywhere that uses RS.  ;-)

| <lf>
| <cr>
| <lf><cr>
| <cr><lf>

All four of these have been used on various computer systems.  A  lot
of  linux  software  now  notes  what  is  in the input, and tries to
produce the same in the output.  This can be a bit tricky, especially
if  the data has been passed around the Net and has a mixture of line
separators. But it's probably worth doing if you want your program to
be usable anywhere.

This was one of the messier thing that I found in writing my ABC Tune
Finder.   Some software does very, uh, "interesting" things with line
separators.  I finally threw up my hands, and had  the  code  convert
everything  to the ANSI standard, with just LF as the terminator.  (I
did briefly contemplate using RS ...  ;-)

There are other standards that define  other  line  separators.   For
example, in HTML neither CR nor LF (nor RS) is a line separator. HTML
uses "<BR>" for that purpose.  Where things get really fun is the web
servers  that  send .abc files marked as text/html, although they are
actually just plain ASCII text.  Grrr ...

To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html

Reply via email to