On Mon, Aug 8, 2016 at 11:24 AM, Eric Covener <cove...@gmail.com> wrote:
> On Mon, Aug 8, 2016 at 12:03 PM, William A Rowe Jr <wr...@rowe-clan.net> > wrote: > > Easier is to do a compile time comparison of '\n' to 0x15 vs 0x25. But I > > need to know the mystery of 0x25's value through iconv on your > architecture. > > Please research, if they simply trade places we are fine. If they both > map > > to 0x0A in ASCII we simply treat them as equal in our comparison fn. And > the > > resulting table will be correct irrespective of what iconv munging has > been > > performe > > On z/OS the 15 and 25 are inverted: > > $ printf "\r\n\x25" |od -t x1 > 0000000000 0D 15 25 > 0000000003 > $ printf "\r\n\x25" |iconv -f IBM1047 -t ISO8859-1|od -t x1 > 0000000000 0D 0A 85 > > (same result for 037 codepage, confirmed character constants compiled the > same) > Thanks, that's trivial to account for in the fixed table, '\n' == 0x15 (vs 0x25) I think we could accomodate 037 vs. 1047 by simply comparing these values; [ ] ^ BA BB B0 AD BD 5F Beyond this, things start to get weird, I've attached the list. Particularly, 937 and 1399 are problematic because we can't test '\0x0f' == 0x0f, and there is no standard C escape sequence for 0x0e/0x0f to use as a compile time trigger. Interestingly, those are the only C0 codes that ever seems to differ between EBCDIC code pages. For the most part, however, we only care that 1:1 our Alpha upper-lower matches for apr_cstr_casecmp. For these ap_* util.c functions, we also care that all C0 chars fall in the same C0 set of mappings, and that all other values which map to ASCII translate to some ASCII visible character value. Beginning to think that a simple run-time program to regenerate this table when the user chooses and is running on an unusual code page could be valuable, using iconv directly and not through apr to preserve cross-compilation, and verifying in our ap_init_ebcdic that we are running with the correct code page.
Acceptible mapping of ASCII for ibm1047, cp1047, osf10020417, ibm-1137, 1047, ibm-1047, ibm1137, csibm1137, cp1137 Format of unusually mapped code pages is in this three line format; Translated Characters (\escape or 2 digit ASCII C0 code) Resulting Translation (2 digit EBCDIC from the indicated code pages) Expected Translation (2 digit EBCDIC from CP1047) Unusual mapping of ASCII for cp424, ibm1140, ibm-12712, ebcdic-cp-he, cp282, ebcdic-cp-wt, ibm-1156, ibm1112, csibm037, cp1156, csibm1156, ibm1156, ibm-1112, ibm037, csibm1112, cp1112, ebcdic-cp-ca, ibm-1140, cp1140, ebcdic-cp-nl, cp12712, ebcdic-cp-us, osf100201a8, ibm424, cp1070, osf10020025, ibm12712, csibm424, csibm12712, csibm1140, cp037 [ ] ^ BA BB B0 AD BD 5F Unusual mapping of ASCII for cp1399, ibm1399, csibm1399, ibm-1399 0e 0f ^ 3F 3F B0 0E 0F 5F Unusual mapping of ASCII for cp1160, ibm9030, csibm1160, csibm9030, ibm-9030, ibm1160, ibm1132, ibm-1132, ibm-1160, cp9030, csibm1132, cp1132 [ ] ^ 49 59 69 AD BD 5F Unusual mapping of ASCII for ibm1025, ibm-1123, ibm-1153, ibm-1158, ebcdic-cp-roece, ibm870, cp1158, cp1154, cp1153, cp1123, csibm1158, csibm1153, ibm1153, ibm1154, ibm1158, ibm-4971, ebcdic-cp-yu, csibm1123, cp4971, cp1025, csibm1166, ibm-1025, csibm1025, csibm1154, ibm1166, ibm1123, cp870, cp1166, ibm-1154, csibm4971, ibm-1166, ibm4971, osf10020366, csibm870 ! [ ] | 4F 4A 5A 6A 5A AD BD 4F Unusual mapping of ASCII for ibm1148, ibm500, ebcdic-int1, ibm1164, csibm1164, ebcdic-cp-ch, ibm-1148, ibm-1130, 500v1, cp1148, osf100201f4, 500, ibm256, cp1084, cp1164, ibm1130, cp500, csibm500, ebcdic-cp-be, csibm1130, ibm-1164, cp1130, csibm1148 ! [ ] | 4F 4A 5A BB 5A AD BD 4F Unusual mapping of ASCII for osf10020396, ibm918, csibm918, cp918, ebcdic-cp-ar2 ! [ ] ` | 4F 4A 5A 6A BB 5A AD BD 79 4F Unusual mapping of ASCII for ibm937, ibm-937, cp937, csibm937 0e 0f [ ] ^ 3F 3F BA BB B0 0E 0F AD BD 5F Unusual mapping of ASCII for ibm939, ibm-939, cp939, csibm939 0e 0f \ ^ ~ 3F 3F B2 B0 A0 0E 0F E0 5F A1 Unusual mapping of ASCII for ibm1146, cp285, csibm285, ibm-1146, cp1146, ibm285, ebcdic-cp-gb, osf1002011d, csibm1146 $ [ ] ^ ~ 4A B1 BB BA BC 5B AD BD 5F A1 Unusual mapping of ASCII for ibm1145, cp284, osf1002011c, csibm284, ibm-1145, cp1145, ibm284, cp1079, ebcdic-cp-es, csibm1145 ! # [ ] ^ ~ BB 69 4A 5A BA BD 5A 7B AD BD 5F A1 Unusual mapping of ASCII for ibm933, ibm-933, cp933, ibm-1364, csibm1364, ibm1364, cp1364, csibm933 0e 0f [ \ ] ^ 3F 3F 70 B2 80 B0 0E 0F AD E0 BD 5F Unusual mapping of ASCII for ibm-1371, cp1371, csibm1371, ibm1371 0e 0f [ \ ] ^ 3F 3F BA 5B BB B0 0E 0F AD E0 BD 5F Unusual mapping of ASCII for ibm935, ibm-935, ibm-1388, cp935, csibm1388, cp1388, ibm1388, csibm935 0e 0f $ [ \ ] ^ ~ 3F 3F E0 BA B2 BB B0 A0 0E 0F 5B AD E0 BD 5F A1 Unusual mapping of ASCII for ibm1141, ibm273, csibm273, ibm-1141, cp1141, cp273, osf10020111, csibm1141 ! @ [ \ ] { | } ~ 4F B5 63 EC FC 43 BB DC 59 5A 7C AD E0 BD C0 4F D0 A1 Unusual mapping of ASCII for ibm1142, osf10020115, ibm277, csibm277, ebcdic-cp-no, ebcdic-cp-dk, cp1142, ibm-1142, csibm1142 ! # $ @ [ ] { | } ~ 4F 4A 67 80 9E 9F 9C BB 47 DC 5A 7B 5B 7C AD BD C0 4F D0 A1 Unusual mapping of ASCII for ibm278, csibm278, osf10020116, cp278, ebcdic-cp-fi, ebcdic-cp-se ! # $ @ [ ] ` { | } ~ 4F 63 67 EC B5 9F 51 43 BB 47 DC 5A 7B 5B 7C AD BD 79 C0 4F D0 A1 Unusual mapping of ASCII for ibm1144, cp280, ibm-1144, cp1144, ibm280, csibm280, ebcdic-cp-it, osf10020118, csibm1144 ! # @ [ \ ] ` { | } ~ 4F B1 B5 90 48 51 DD 44 BB 54 58 5A 7B 7C AD E0 BD 79 C0 4F D0 A1 Unusual mapping of ASCII for ibm1147, cp297, ibm-1147, osf10020129, cp1147, cp1081, csibm297, ibm297, ebcdic-cp-fr, csibm1147 ! # @ [ \ ] ` { | } ~ 4F B1 44 90 48 B5 A0 51 BB 54 BD 5A 7B 7C AD E0 BD 79 C0 4F D0 A1 Unusual mapping of ASCII for ibm1149, ibm-1149, cp1149, cp871, ibm871, ebcdic-cp-is, csibm871, osf10020367, csibm1149 ! @ [ \ ] ^ ` { | } ~ 4F AC AE BE 9E EC 8C 8E BB 9C CC 5A 7C AD E0 BD 5F 79 C0 4F D0 A1 Unusual mapping of ASCII for ibm1143, ibm-1122, ibm-1157, cp1157, cp1122, csibm1122, ibm1157, ibm-1143, cp1143, csibm1157, ibm1122, csibm1143 ! # $ @ [ \ ] ` { | } ~ 4F 63 67 EC B5 71 9F 51 43 BB 47 DC 5A 7B 5B 7C AD E0 BD 79 C0 4F D0 A1 Unusual mapping of ASCII for ibm1026, ibm-1155, cp1155, ibm1155, cp1026, csibm1026, csibm1155, 1026, osf10020402 ! " # $ @ [ \ ] ` { | } ~ 4F FC EC AD AE 68 DC AC 8D 48 BB 8C CC 5A 7F 7B 5B 7C AD E0 BD 79 C0 4F D0 A1 Unusual mapping of ASCII for ibm930, ibm-930, cp930, csibm930 0e 0f $ [ \ ] ^ a b c d e f g h i j k l m n o p q r s t u v w x y z 3F 3F E0 70 5B 80 B0 62 63 64 65 66 67 68 69 71 72 73 74 75 76 77 78 8B 9B AB B3 B4 B5 B6 B7 B8 B9 0E 0F 5B AD E0 BD 5F 81 82 83 84 85 86 87 88 89 91 92 93 94 95 96 97 98 99 A2 A3 A4 A5 A6 A7 A8 A9 Unusual mapping of ASCII for cp1390, ibm-1390, csibm1390, ibm1390 0e 0f $ [ \ ] ^ a b c d e f g h i j k l m n o p q r s t u v w x y z ~ 3F 3F E0 70 B2 80 B0 62 63 64 65 66 67 68 69 71 72 73 74 75 76 77 78 8B 9B AB B3 B4 B5 B6 B7 B8 B9 A0 0E 0F 5B AD E0 BD 5F 81 82 83 84 85 86 87 88 89 91 92 93 94 95 96 97 98 99 A2 A3 A4 A5 A6 A7 A8 A9 A1 Unable to translate basic 128 ASCII characters in ibm16804, ibm1097, csebcdicatde, cp420, cp423, cp281, ibm880, ebcdic-fr, ibm281, csibm880, csebcdicdkno, ebcdic-us, csibm4517, ibm803, ebcdicdknoa, ebcdic-br, csibm038, ebcdic-be, ibm274, osf10020370, csebcdices, csibm275, csibm274, ebcdic-ca-fr, ebcdic-is-friss, ebcdic-dk-no, csebcdicfise, csebcdicdknoa, cp803, csibm16804, ibm038, ibm-803, ebcdices, ibm275, csebcdicit, ibm4517, csebcdicpt, ebcdicisfriss, csebcdicess, csibm281, csebcdicesa, ebcdicess, csebcdicfr, osf10020122, ebcdic-es-a, ebcdic-es-s, ebcdicesa, ebcdicpt, ebcdic-es, ebcdic-dk-no-a, ebcdic-cp-gr, ebcdiccafr, ebcdicfisea, ebcdicit, ibm4899, ebcdic-jp-e, ibm-16804, cp275, cp274, csibm423, csibm420, ibm-4899, cp4517, ebcdic-at-de, csibm4899, cp875, ibm875, ebcdic-pt, ebcdicfr, csibm290, ebcdicfise, csebcdicfisea, cp290, ebcdic-it, ibm423, ibm420, ibm290, csebcdiccafr, ebcdic-uk, ebcdic-at-de-a, cp16804, ebcdic-fi-se-a, cp4899, csebcdicuk, csibm803, csebcdicus, osf100201a4, ebcdic-jp-kana, ibm-1097, ebcdicuk, ebcdicus, csebcdicatdea, ebcdicatde, ebcdic-greek, ebcdic-fi-se, ebcdicatdea, csibm1097, ebcdic-int, ebcdic-cyrillic, ebcdic-cp-ar1, ebcdicdkno, ibm-4517, osf1002036b, cp880, cp1097, cp038
#!/usr/bin/python # # Relies on https://pypi.python.org/pypi/iconv_codecs 0.2a1 or later import iconv_codecs clst = iconv_codecs.get_supported_codecs() nstr = ''.join(list(map(unichr,range(0,127)))).encode('iconv:ibm-1047') cleanmaps = list() failedmaps = list() obscuremaps = list() obscureelts = list() for cod in clst: try: if 'A'.encode('iconv:' + cod) == '\xC1': try: cstr = ''.join(list(map(unichr,range(0,127)))).encode('iconv:' + cod) if cstr == nstr: cleanmaps.append(cod) else: try: obscuremaps[obscureelts.index(cstr)].append(cod) except: obscuremaps.append([ cod ]) obscureelts.append(cstr) except: failedmaps.append(cod) except: pass print 'Acceptible mapping of ASCII for ' + ', '.join(cleanmaps) for l in range(0, len(obscuremaps) - 1): print 'Unusual mapping of ASCII for ' + ', '.join(obscuremaps[l]) cstr = obscureelts[l] clst = cdiff = ndiff = ''; for i in range(0, 127): if cstr[i] != nstr[i]: if (i >= 32 and i < 127): clst += ' ' + unichr(i) + ' ' else: ch = unichr(10).encode('unicode-escape') + ' ' if len(ch) == 2: clst += ch else: clst += "%02x " % i if i < len(cstr): cdiff += "%02X " % ord(cstr[i]) else: cdiff += ' ' if i < len(nstr): ndiff += "%02X " % ord(nstr[i]) else: ndiff += ' ' print clst print cdiff print ndiff print 'Unable to translate basic 128 ASCII characters in ' + ', '.join(failedmaps)