On Mar 6, 2010, at 13:05, Bob Cronin wrote:

> Yes in general I don't have a filetype. The application is an email gateway.
> From the responses so far it seems like heuristics are the only approach. I
> was hoping there might be something more deterministic (although I suspected
> probably not).
>  
Alas, certainly not.  The best you can hope for is that if your
file contains a character at a code point invalid in some code
pages, you can eliminate those code pages from consideration.

You should provide a means for the user to specify a code page,
optionally.

What do you do if you know the EBCDIC code page?  Translate it
to an ASCII or Unicode page which supports all the characters
in the EBCDIC page?

(Wandering off-topic)  I just performed an experiment to confirm
an ugly suspicion.  From an ASCII system, I sent a mail message
which contained the MIME headers:

    Content-Type: text/plain;
        charset=us-ascii
    Content-Transfer-Encoding: quoted-printable

... It arrived at a VM system with those headers transformed to:

    Content-transfer-encoding: 7BIT
    Content-type: text/plain; CHARSET=US-ASCII

Ummm...  But it's sitting in my reader as an EBCDIC file.  Shouldn't
whatever agent transformed it from us-ascii to EBCDIC have adjusted
the headers to:

    Content-transfer-encoding: 8BIT
    Content-type: text/plain; CHARSET=IBM-1047

or:

    Content-transfer-encoding: 8BIT
    Content-type: text/plain; CHARSET=IBM-37-2

Whatever?  Once the transformation is performed, US-ASCII is a
lie, and there's no way EBCDIC fits in 7 bits.

I wonder what it would have done to the body and the headers if
the receiving VM system had been in Japan, using EBCDIC 939?

-- gil

Reply via email to