John,  I truly do not think you need to worry so much about UUENCODE.
I suggest you remain skeptical,  but not about UUE.   It is only
one example,  and is much less problematic than others.

> I think I used CP500 on OS/2.

I think CP500 is an EBCDIC codepage,  not an ASCII codepage.
(Loose use of the terms "EBCDIC" and "ASCII".  Please be gentle.)

> On Linux, ISO8859-15 representations, and before that ISO8859-1.
> Except where software (such as this email client) has its own mind
> about things and uses us-ascii.

Hopefully here,  as in other strategic places,
this is only a labelling issue and the content is not altered.
When going between an "ASCII base" and an "EBCDIC base"
one must translate,  and all I'm saying is that we should use a
one-for-one reversible table.   To establish such a table I used
CP 37 v2 to represent EBCDIC and ISO 8859-1 to represent ASCII.
Once the table is established,  the labels can be discarded
and executables built with that table will work just fine.

The worst I get from this  (being a heavy Pine user)
is a nasty message  "I don't know how to handle that codepage
so it may look funny on your screen".   Clearly the agent
(in this case Pine)  is leaving the data intact.   Good!

> When I build a kernel the only place there's mention of codepages
> is WRT DOS filesystems.
>
> I've just checked what modules I have loaded:
> nls_cp437               4320   0  (autoclean)
> nls_iso8859-1           2816   0  (autoclean)

And what does this mean?
Does the FS driver perform translation?   Ouch!

> As I noted to you offlist, the encoding performed is a binary one
> and takes no notice of codepages or such. That comes in only when
> the result is displayed.

Right.   See my comments above about  "establish a table".
I know what my own code does,  and for that we are on the same page.
(Pun not intended,  but noted.)   NLS support in the CMS FS is not
planned.   What is the effect of NLS support in other FS drivers?
But CMS FS does employ translation.   But what CMS FS does
should not break UUENCODEd content.   (I have not tested.)

There seem to be two places where "codepages" are considered:
one where translation MUST be applied,  as in crossing an A/E boundary,
and another where translation MAY be applied,  such as nationalization.
This is not to say that the latter is unimportant:  surely my
Israeli friends would like to see Hebrew presented correctly.

I submit to you all that,  except for crossing an A/E boundary,
codepage application should be deferred until the latest possible moment,
not unlike how fonts are applied to web content.   Choosing any other
point in the processing to insert NLS handling leads to  1)  increased
code complexity,  2)  increased processing load,  and  3)  content
corruption.   For those with 1GHz and faster Pentiums,  processing load
may not seem to matter,  but does when scalability returns to prominence.
Code complexity leads to miserable programmers making miserable products.
And data corruption should strike fear in all our hearts.

Reply via email to