Hello Data Format Experts!
[Definition] Property: an attribute, quality, or characteristic of something.
JPEG is a binary data format.
CSV is a text data format.
Question #1: Is the binaryness/textness of a data format a property?
Question #2: If the answer to Question #1 is yes, then what is
Based on a private correspondence, I now realize that this statement:
> Text files do not contain binary
is not correct.
Text files may indeed contain binary (i.e., bytes that are not interpretable as
characters). Namely, text files may contain newlines, tabs, and some other
invisible
Hi Folks,
There are binary files and there are text files.
Binary files often contain portions that are text. For example, the start of
Windows executable files is the text MZ.
To the best of my knowledge, text files never contain binary, i.e., bytes that
cannot be interpreted as characters.
>From the book titled "Computer Power and Human Reason" by Joseph Weizenbaum,
>p.74-75
Suppose that the alphabet with which we wish to concern ourselves consists of
256 distinct symbols. Imagine that we have a deck of 256 cards, each of which
has a distinct symbol of our alphabet printed on
Hi Folks,
Today I received an email from the Unicode organization. The email said this:
(italics and yellow highlighting are mine)
The Unicode Standard is the foundation for all modern software and
communications around the world, including all modern operating systems,
browsers, laptops, and
Hello Unicode experts!
Which is correct:
(a) The input file contains a string. The string is encoded using UTF-8.
(b) The input file contains a string. The string is encoded with UTF-8.
(c) The input file contains a string. The string is encoded in UTF-8.
(d) Something else (what?)
/Roger
Hello Unicode Experts!
As I understand it, endian-ness applies to multi-byte words.
Endian-ness does not apply to ASCII characters because each character is a
single byte.
Endian-ness does apply to UTF-16BE (Big-Endian), UTF-16LE (Little-Endian),
UTF-32BE and UTF32-LE because each character
Hi Folks,
Thank you for your outstanding responses!
Below is a summary of what I learned. Are there any errors in the summary? Is
there anything you would add? Please let me know of anything that is not clear.
/Roger
1. While base64 encoding is usually applied to binary, it is also
Hi Unicode Experts,
Suppose base64 encoding is applied to m to yield base64 text t.
Next, suppose base64 encoding is applied to m' to yield base64 text t'.
If m is not equal to m', then t will not equal t'.
In other words, given different inputs, base64 encoding always yields different
Hi Folks,
Thank you very much for your fantastic comments!
Below I summarized the issue and your comments. At the bottom is a set of
proposed requirements (for my clients) on applications that receive iCalendar
files.
Some questions:
- Have I captured all your comments? Any more comments?
-
Hello Unicode Experts!
Suppose an application splits a UTF-8 multi-octet sequence. The application
then sends the split sequence to a client. The client must restore the original
sequence.
Question: is it possible to split a UTF-8 multi-octet sequence in such a way
that the client cannot
11 matches
Mail list logo