Michael Jennings <[EMAIL PROTECTED]> writes:

> The issue centers on the documentation. Philosophically, in my
> opinion, a program should be written so the documentation is easy to
> read. In this case a hidden stripping of useless characters means
> that there is one less thing to explain in the manual.

No, it's one *more* thing to explain in the manual.  The only
characters universally agreed to be "useless" in the context of
parsing are the whitespace characters.  *Everything* else is subject
to serious considerations.

For example, "control characters" for you might be UTF8-encoded
characters for someone else.  Not stripping them away without a very
good reason to do so is for me a simple matter of correctness.

The GNU coding standards seem to suggest the same.

    (...) Or go for generality.  For example, Unix programs often have
    static tables or fixed-size strings, which make for arbitrary
    limits; use dynamic allocation instead.  Make sure your program
    handles NULs and other funny characters in the input files.  Add a
    programming language for extensibility and write part of the
    program in that language.

and:

    Utilities reading files should not drop NUL characters, or any
    other nonprinting characters _including those with codes above
    0177_.  The only sensible exceptions would be utilities
    specifically intended for interface to certain types of terminals
    or printers that can't handle those characters.  Whenever
    possible, try to make programs work properly with sequences of
    bytes that represent multibyte characters, using encodings such as
    UTF-8 and others.

> There is precedent for this. Microsoft Windows is in some places
> written to get around shortcomings in the processors on which it
> runs. Such accommodation puts quirkiness in the code, but it gets
> the job done.

In many cases Wget tries to accommodate to its environment to ensure
smoother operation.  But with each such accomodation we are forced to
weigh the added "quirkiness" (entropy) of the code against the
benefit.

In this case, implementing correct support for ^Z is not exactly
trivial, and the benefit is minimal -- the ^Z characters don't even
appear in files normally created on platforms supported by Wget, which
are Unix and Windows.

You are trying to convince us otherwise by offering an easier
implementation of ^Z, thereby reducing the costs.  But unfortunately
this easier implementation reduces correctness of the code, and is
therefore not an option.  Sorry.

Reply via email to