This is not specifically pipeline-related (although if there's a solution, I'll likely implement it using pipelines). I'd just like to pick the brains of a lot of very smart people with lots of IBM mainframe experience ...
Can anyone suggest possible approaches to the problem of examining an arbitrary collection of EBCDIC text (all presumed to have been prepared using the same codepage) and somehow determining which codepage that was? ASCII mail clients (e.g. such as Lotus Notes) have functionality to choose a "best match" ASCII character set to use for Internet mail. I would like to be able to do the same thing for EBCDIC (so that when I convert it to ASCII, I choose an EBCDIC-to-ASCII translation table that has the maximum probability of delivering the correct characters). I need to detect both single and double-byte EBCDIC encodings. At present I use a somewhat cumbersome table-driven approach which defines the most likely EBCDIC codepage to be in use by the users of a given VM system (e.g. if that system is in Japan, I presume EBCDIC 939). I'd like to try to improve it. I'd really rather just use Unicode, but alas, VM does not support it (not sure about MVS, but I suspect not). -- bc
