USS does support Unicode ... sort of. Actually, any support of Unicode these days is still "sort of", so I don't mean to slam USS support of it. Just that it will (still) be very entrenched in non-Unicode compatibility reqs. (duh - as you probably would expect) Look for UTF-8 and how to enable it.
The quickest trigger for me, w/r/t which EBCDIC code page was used, is where the square brackets land. If I see AD and BD, then it's probably 1047. If I see BA and BB, then it's a good guess that it's CP 37. But most of my world involves C source or other things rich in square brackets. YMMV. Scanning for "not" is another helpful hint. -- R; <>< On Fri, Mar 5, 2010 at 13:38, Bob Cronin <[email protected]> wrote: > This is not specifically pipeline-related (although if there's a solution, > I'll likely implement it using pipelines). I'd just like to pick the brains > of a lot of very smart people with lots of IBM mainframe experience ... > > Can anyone suggest possible approaches to the problem of examining an > arbitrary collection of EBCDIC text (all presumed to have been prepared > using the same codepage) and somehow determining which codepage that was? > ASCII mail clients (e.g. such as Lotus Notes) have functionality to choose a > "best match" ASCII character set to use for Internet mail. I would like to > be able to do the same thing for EBCDIC (so that when I convert it to ASCII, > I choose an EBCDIC-to-ASCII translation table that has the maximum > probability of delivering the correct characters). I need to detect both > single and double-byte EBCDIC encodings. At present I use a somewhat > cumbersome table-driven approach which defines the most likely EBCDIC > codepage to be in use by the users of a given VM system (e.g. if that system > is in Japan, I presume EBCDIC 939). I'd like to try to improve it. > > I'd really rather just use Unicode, but alas, VM does not support it (not > sure about MVS, but I suspect not). > -- > bc >
