OK, I solved the problem, but maybe someone here can come up with something a
bit more efficient...
There is a file in the non-z/OS world, that used to be pure ASCII (actually
CP437/850), but that has now been converted to UTF-8, due to further
internationalisation requirements. Said file was uploaded to z/OS, processed
into a set of datasets containing various reports, and those reports were later
downloaded to the non-z/OS world, using the same process that was used to upload
them, which could be one of two, IND$FILE, or FTP.
Both FTP and IND$FILE uploads had (and still have) no problems with
CP437/850/UTF-8 data, and although an ü might not have displayed as such on
z/OS, it would have transferred back to the same ü. However, an ü in UTF-8 now
consists of two characters, and that means that, replacing spaces with '='
characters, the original
|=Süd====|
|=Nord===|
report lines now come out as
|=Süd===|
|=Nord===|
when opened in the non z/OS world with an UTF-8 aware application.
Given that, and in this case I was lucky, the PC file had the option to add
comment-type lines, I solved the problem (the z/OS dataset is processed with
PL/I) by adding an extra line to the input file of the required comment
delimiter followed by "ASCII " followed by the 240 ASCII characters from '20'x
to 'ff'x. The PL/I program uses this "special meta-data comment" to transform
the input data, which has been translated by IND$FILE/FTP to EBCDIC back into a
format where all UTF-8 initial characters are translated to '1' and all UTF-8
follow-on bytes to '0', i.e.
dcl ascii char (240); /* containing the 240 characters from '20'x to 'ff'x, read
in via an additional comment record in the original non-z/OS file */
dcl utf8 char (240) init (('11111111111111111111111111111111' ||
'11111111111111111111111111111111' ||
'11111111111111111111111111111111' ||
'00000000000000000000000000000000' ||
'00000000000000000000000000000000' ||
'00111111111111111111111111111111' ||
'11111111111111111111100000000000'));
and to get the number of UTF-8 displayable characters of, e.g. myvar, a char(47)
variable, I use the following
dcl a47(47) pic '9';
dcl more char (20) var;
string(a47) = translate(myvar, utf8, ascii);
more = copy(' ', 47 - sum(a47));
where "more" is the number of extra blanks that needs to be added into the
report column to ensure that the columns line-out again in the non-z/OS UTF-8
world. The (relative) beauty of this approach lies in the fact that the
technique is completely code-page independent, and could even be used with the
PL/I compiler on Windows.
The above works like a charm, however, both translate() and sum(), especially of
pic '9' data, are not exactly the most efficient functions, so the question is,
can anyone think of a more efficient way, other than the quick(?) and dirty
solution of using a macro on the non-z/OS side, to set "more" the the required
number of characters. I'm open to a PL/I callable assembler routine, but the
process must be, like the one above, completely code-page independent!
Robert
--
Robert AH Prins
robert.ah.prins(a)gmail.com
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN