Yes in general I don't have a filetype. The application is an email gateway. >From the responses so far it seems like heuristics are the only approach. I was hoping there might be something more deterministic (although I suspected probably not). -- bc
On Sat, Mar 6, 2010 at 1:47 PM, Paul Gilmartin <[email protected]> wrote: > On Mar 6, 2010, at 10:10, Richard Troth wrote: > > > > > > The quickest trigger for me, w/r/t which EBCDIC code page was used, is > > where the square brackets land. If I see AD and BD, then it's > > probably 1047. If I see BA and BB, then it's a good guess that it's > > CP 37. But most of my world involves C source or other things rich in > > square brackets. YMMV. Scanning for "not" is another helpful hint. > > > That sounds like a good technique for C source code. And one > can confirm one's guess of "C" by looking for strings such as > "#include", "int", and "/*". What Rexx characters are code-page > sensitive? > > Similarly, one might recognize Rexx by the "/* Rexx */" initial > comment, "DO", "END", etc. > > Are you in a place where the CMS filetype is no help, such as > a pipelines data stream? > > Other languages? Do you want to ignore comments, which skew > the statistics? > > UTF-8 rules! Is there PIPE XLATE FROM 1047 to UTF-8? (Available > in z/OS iconv().) But is your mail agent RFC 1652 savvy? > > -- gil >
