"Paul Herring" <[EMAIL PROTECTED]> wrote:
> On 3/29/07, glogic_1 <[EMAIL PROTECTED]> wrote:
> > Hey all
> > Just a general query about binary files. Now im aware that
> > getLine doesnt work with binary files but im wondering why
> > this is what actually happens?
> > 
> > i checked it out on the net and all the info i found
> > was abit over my head.. can anyone break it down into
> > noob english?
> 
> A text file is lines separated by linefeeds (a byte or series
> of bytes "unlikely to occur within a line.")
> 
> A binary file is just a huge 'lump' of data.
> 
> Thus a binary files don't have 'lines' as such (though a/few
> byte(s) within such a file may have the right combination of
> bytes to mark a line delimiter, these aren't "seen" as such.)
> 
> If you open a text file in binary mode, linefeeds just 'blur'
> into the rest of the file and 'stop existing' (for the purposes
> of getline() e.g.)

Note that text lines needn't be delimeted per se. They can be
fixed sized records, or length prefixed strings. The SGML
standard, for instance, specifies a record begin character and
record end character for each line.

> > and also what do i have to watch out for when porting binary
> > files from one computer architecture to another? in some of
> > the info there were references of a danger in doing this but
> > never went into detail..
> > any ideas what this means?
> 
> There are two issues that spring to mind - one affecting 'binary'
> files, the other 'text' files.
> 
> The only thing I could possibly think of affecting binary files
> would be endianess of numbers stored within the file when moved
> from one architecture to another. (One system thinks the first
> byte is the high byte of an integer, the other thinks it's the
> low byte, for example)

This also applies to wide character text files too.

>From a C standard point of view, byte size itself, the maximum
length of a text line, and trailing null bytes at the end of a
binary file are issues as well.

> For text files however, different systems (read 'operating
> systems' loosely) have different byte representations of what
> a 'linefeed' is. If you move a text file in binary mode, what
> were linefeeds on the original system become garbage on the
> destination.

More broadly, the character coding used for the text itself
can vary from system to system. I'm not just talking about
the difference between ASCII and EBCDIC. Different European
languages use different codings for, say, the letter é.

-- 
Peter

Reply via email to