Aw: Why do binary files contain text but text files don't contain binary?
From a practical point of view, text files contain text that is broken into lines. And by a long-standing tradition, line breaks are treated differently among different operating systems. Whenever one transfers a text file between operating systems, the process behing that transfer cares to convert the line breaks according to the target OS's conventions. Binary files are much simpler: They can be just transfered without converting anything, even between different operating systems. Of course, this does not mean that an executable under one OS remains being a valid exe under another OS, but there lots of non-executable binaries that are useful independent of the OS (e.g. images, sound files, video files, lots of other application files). So, for a successful file transfer one needs to know whether it is text or binary, and handle it accordingly. --Jörg Knappen Gesendet: Freitag, 21. Februar 2020 um 13:21 Uhr Von: "Costello, Roger L. via Unicode" An: "unicode@unicode.org" Betreff: Why do binary files contain text but text files don't contain binary? Hi Folks, There are binary files and there are text files. Binary files often contain portions that are text. For example, the start of Windows executable files is the text MZ. To the best of my knowledge, text files never contain binary, i.e., bytes that cannot be interpreted as characters. (Of course, text files may contain a text-encoding of binary, such as base64-encoded text.) Why the asymmetry? /Roger
Re: Why do binary files contain text but text files don't contain binary?
> On 21 Feb 2020, at 13:21, Costello, Roger L. via Unicode > wrote: > > There are binary files and there are text files. In C, when opening a file as binary with the function fopen, the newlines are untranslated [1]. If not using this option, the file is informally text, which means that internally in the program, one can assume that the newline [2] is the character U+000A LINE FEED (LF). 1. https://en.cppreference.com/w/cpp/io/c/fopen 2. https://en.wikipedia.org/wiki/Newline
RE: Why do binary files contain text but text files don't contain binary?
Costello, Roger L. wrote: > Text files may indeed contain binary (i.e., bytes that are not> interpretable as characters). Namely, text files may contain newlines,> tabs, and some other invisible things.>> Question: "characters" are defined as only the visible things, right? In addition to this being incorrect, as Ken and Richard (so far) have pointed out, this isn't the distinction you are looking for. All file formats contain data which is relevant to that file format. Zip files, executables, JPEGs, MP4s, all contain specific data structured in a specific way. If any of them has that structure interrupted by random bytes, the format has been broken and the file is corrupt. It is no different for text data, which is expected to contain certain bytes and is normally not expected to be interrupted by a series of ranëH‰UÀHƒÈÿH Does that help? --Doug Ewell | Thornton, CO, US | ewellic.org
Re: Why do binary files contain text but text files don't contain binary?
On 2/21/2020 7:53 AM, Costello, Roger L. via Unicode wrote: Text files may indeed contain binary (i.e., bytes that are not interpretable as characters). Namely, text files may contain newlines, tabs, and some other invisible things. Question: "characters" are defined as only the visible things, right? No. You've gone astray right there. Please read Chapter 2 of the Unicode Standard, and in particular, Section 2.4, Code Points and Characters: https://www.unicode.org/versions/Unicode12.0.0/ch02.pdf#G25564 All of those types of characters can occur in Unicode plain text. (With the exception of surrogate code points.) I conclude: Binary files may contain arbitrary text. Binary files can contain *whatever*, including text. Text files may contain binary, but only a restricted set of binary. The distinction is definitional. A text file contains *only* characters, interpretable by a specific character encoding (usually Unicode, these days). But a text file need not be "plain text". An HTML file is an example of a text file (it contains only a sequence of characters, whose identity and interpretation is all clearly specified by looking them up in the Unicode Standard), but it is not *plain* text. It is *rich* text, consisting of markup tags interspersed with runs of plain text. Another distinction that may be leading you astray is the distinction between binary file transfer and text file transfer. If you are using ftp, for example, you can specify use of binary file transfer, *even if* the file you are transferring is actually a text file. That simply means that the file transfer will agree to treat the entire file as a binary blob and transfer it byte-for-byte intact. A text file transfer, on the other hand, may look for "lines" in a text file and may adjust line endings to suit the receiving platform conventions. Do you agree? No. --Ken
Re: Why do binary files contain text but text files don't contain binary?
On Fri, 21 Feb 2020 15:53:52 + "Costello, Roger L. via Unicode" wrote: > Based on a private correspondence, I now realize that this statement: > > > > > Text files do not contain binary > > > > is not correct. > > > > Text files may indeed contain binary (i.e., bytes that are not > interpretable as characters). Namely, text files may contain > newlines, tabs, and some other invisible things. > > > > Question: "characters" are defined as only the visible things, right? No, white space (e.g. spaces, tabs and newlines) is normally considered to be composed of characters. And then there are much harder to discern things, such as zero-width spaces, line-break suppressors such as U+2060 WORD JOINER, and soft hyphens (interpreted as line-break opportunities). Richard.
RE: Why do binary files contain text but text files don't contain binary?
Based on a private correspondence, I now realize that this statement: > Text files do not contain binary is not correct. Text files may indeed contain binary (i.e., bytes that are not interpretable as characters). Namely, text files may contain newlines, tabs, and some other invisible things. Question: "characters" are defined as only the visible things, right? I conclude: Binary files may contain arbitrary text. Text files may contain binary, but only a restricted set of binary. Do you agree? /Roger From: Costello, Roger L. Sent: Friday, February 21, 2020 7:22 AM To: unicode@unicode.org Subject: Why do binary files contain text but text files don't contain binary? Hi Folks, There are binary files and there are text files. Binary files often contain portions that are text. For example, the start of Windows executable files is the text MZ. To the best of my knowledge, text files never contain binary, i.e., bytes that cannot be interpreted as characters. (Of course, text files may contain a text-encoding of binary, such as base64-encoded text.) Why the asymmetry? /Roger
Re: Why do binary files contain text but text files don't contain binary?
Dear Roger, because in when unicode is used in real life, utf8 etc then text ⊂ binary John Knightley On 2020-02-21 20:21, Costello, Roger L. via Unicode wrote: Hi Folks, There are binary files and there are text files. Binary files often contain portions that are text. For example, the start of Windows executable files is the text MZ. To the best of my knowledge, text files never contain binary, i.e., bytes that cannot be interpreted as characters. (Of course, text files may contain a text-encoding of binary, such as base64-encoded text.) Why the asymmetry? /Roger
Why do binary files contain text but text files don't contain binary?
Hi Folks, There are binary files and there are text files. Binary files often contain portions that are text. For example, the start of Windows executable files is the text MZ. To the best of my knowledge, text files never contain binary, i.e., bytes that cannot be interpreted as characters. (Of course, text files may contain a text-encoding of binary, such as base64-encoded text.) Why the asymmetry? /Roger