OK, I solved the problem, but maybe someone here can come up with something a bit more efficient...

There is a file in the non-z/OS world, that used to be pure ASCII (actually CP437/850), but that has now been converted to UTF-8, due to further internationalisation requirements. Said file was uploaded to z/OS, processed into a set of datasets containing various reports, and those reports were later downloaded to the non-z/OS world, using the same process that was used to upload them, which could be one of two, IND$FILE, or FTP.

Both FTP and IND$FILE uploads had (and still have) no problems with CP437/850/UTF-8 data, and although an ü might not have displayed as such on z/OS, it would have transferred back to the same ü. However, an ü in UTF-8 now consists of two characters, and that means that, replacing spaces with '=' characters, the original

|=Süd====|
|=Nord===|

report lines now come out as

|=Süd===|
|=Nord===|

when opened in the non z/OS world with an UTF-8 aware application.

Given that, and in this case I was lucky, the PC file had the option to add comment-type lines, I solved the problem (the z/OS dataset is processed with PL/I) by adding an extra line to the input file of the required comment delimiter followed by "ASCII " followed by the 240 ASCII characters from '20'x to 'ff'x. The PL/I program uses this "special meta-data comment" to transform the input data, which has been translated by IND$FILE/FTP to EBCDIC back into a format where all UTF-8 initial characters are translated to '1' and all UTF-8 follow-on bytes to '0', i.e.

dcl ascii char (240); /* containing the 240 characters from '20'x to 'ff'x, read in via an additional comment record in the original non-z/OS file */
dcl utf8  char (240) init (('11111111111111111111111111111111' ||
                            '11111111111111111111111111111111' ||
                            '11111111111111111111111111111111' ||
                            '00000000000000000000000000000000' ||
                            '00000000000000000000000000000000' ||
                            '00111111111111111111111111111111' ||
                            '11111111111111111111100000000000'));

and to get the number of UTF-8 displayable characters of, e.g. myvar, a char(47) variable, I use the following

dcl a47(47) pic '9';
dcl more    char (20) var;

string(a47) = translate(myvar, utf8, ascii);
more        = copy(' ', 47 - sum(a47));

where "more" is the number of extra blanks that needs to be added into the report column to ensure that the columns line-out again in the non-z/OS UTF-8 world. The (relative) beauty of this approach lies in the fact that the technique is completely code-page independent, and could even be used with the PL/I compiler on Windows.

The above works like a charm, however, both translate() and sum(), especially of pic '9' data, are not exactly the most efficient functions, so the question is, can anyone think of a more efficient way, other than the quick(?) and dirty solution of using a macro on the non-z/OS side, to set "more" the the required number of characters. I'm open to a PL/I callable assembler routine, but the process must be, like the one above, completely code-page independent!

Robert
--
Robert AH Prins
robert.ah.prins(a)gmail.com

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to