Yes in general I don't have a filetype. The application is an email gateway.
>From the responses so far it seems like heuristics are the only approach. I
was hoping there might be something more deterministic (although I suspected
probably not).
--
bc

On Sat, Mar 6, 2010 at 1:47 PM, Paul Gilmartin <[email protected]> wrote:

> On Mar 6, 2010, at 10:10, Richard Troth wrote:
> >
> >
> > The quickest trigger for me, w/r/t which EBCDIC code page was used, is
> > where the square brackets land.  If I see AD and BD, then it's
> > probably 1047.  If I see BA and BB, then it's a good guess that it's
> > CP 37.  But most of my world involves C source or other things rich in
> > square brackets.  YMMV.  Scanning for "not" is another helpful hint.
> >
> That sounds like a good technique for C source code.  And one
> can confirm one's guess of "C" by looking for strings such as
> "#include", "int", and "/*".  What Rexx characters are code-page
> sensitive?
>
> Similarly, one might recognize Rexx by the "/* Rexx */" initial
> comment, "DO", "END", etc.
>
> Are you in a place where the CMS filetype is no help, such as
> a pipelines data stream?
>
> Other languages?  Do you want to ignore comments, which skew
> the statistics?
>
> UTF-8 rules!  Is there PIPE XLATE FROM 1047 to UTF-8?  (Available
> in z/OS iconv().)  But is your mail agent RFC 1652 savvy?
>
> -- gil
>

Reply via email to