On Mon, Aug 8, 2016 at 11:24 AM, Eric Covener <cove...@gmail.com> wrote:

> On Mon, Aug 8, 2016 at 12:03 PM, William A Rowe Jr <wr...@rowe-clan.net>
> wrote:
> > Easier is to do a compile time comparison of '\n' to 0x15 vs 0x25. But I
> > need to know the mystery of 0x25's value through iconv on your
> architecture.
> > Please research, if they simply trade places we are fine. If they both
> map
> > to 0x0A in ASCII we simply treat them as equal in our comparison fn. And
> the
> > resulting table will be correct irrespective of what iconv munging has
> been
> > performe
>
> On z/OS the 15 and 25 are inverted:
>
> $ printf "\r\n\x25" |od -t x1
> 0000000000    0D  15  25
> 0000000003
> $ printf "\r\n\x25" |iconv -f IBM1047 -t ISO8859-1|od -t x1
> 0000000000    0D  0A  85
>
> (same result for 037 codepage, confirmed character constants compiled the
> same)
>

Thanks, that's trivial to account for in the fixed table, '\n' == 0x15 (vs
0x25)

I think we could accomodate 037 vs. 1047 by simply comparing these values;

 [  ]  ^
BA BB B0
AD BD 5F

Beyond this, things start to get weird, I've attached the list.

Particularly, 937 and 1399 are problematic because we can't test '\0x0f' ==
0x0f,
and there is no standard C escape sequence for 0x0e/0x0f to use as a compile
time trigger. Interestingly, those are the only C0 codes that ever seems to
differ
between EBCDIC code pages.

For the most part, however, we only care that 1:1 our Alpha upper-lower
matches
for apr_cstr_casecmp. For these ap_* util.c functions, we also care that
all C0
chars fall in the same C0 set of mappings, and that all other values which
map
to ASCII translate to some ASCII visible character value.

Beginning to think that a simple run-time program to regenerate this table
when
the user chooses and is running on an unusual code page could be valuable,
using iconv directly and not through apr to preserve cross-compilation, and
verifying in our ap_init_ebcdic that we are running with the correct code
page.
Acceptible mapping of ASCII for ibm1047, cp1047, osf10020417, ibm-1137, 1047, 
ibm-1047, ibm1137, csibm1137, cp1137

Format of unusually mapped code pages is in this three line format;
Translated Characters (\escape or 2 digit ASCII C0 code)
Resulting Translation (2 digit EBCDIC from the indicated code pages)
Expected Translation (2 digit EBCDIC from CP1047)

Unusual mapping of ASCII for cp424, ibm1140, ibm-12712, ebcdic-cp-he, cp282, 
ebcdic-cp-wt, ibm-1156, ibm1112, csibm037, cp1156, csibm1156, ibm1156, 
ibm-1112, ibm037, csibm1112, cp1112, ebcdic-cp-ca, ibm-1140, cp1140, 
ebcdic-cp-nl, cp12712, ebcdic-cp-us, osf100201a8, ibm424, cp1070, osf10020025, 
ibm12712, csibm424, csibm12712, csibm1140, cp037
 [  ]  ^ 
BA BB B0 
AD BD 5F 

Unusual mapping of ASCII for cp1399, ibm1399, csibm1399, ibm-1399
0e 0f  ^ 
3F 3F B0 
0E 0F 5F 

Unusual mapping of ASCII for cp1160, ibm9030, csibm1160, csibm9030, ibm-9030, 
ibm1160, ibm1132, ibm-1132, ibm-1160, cp9030, csibm1132, cp1132
 [  ]  ^ 
49 59 69 
AD BD 5F 

Unusual mapping of ASCII for ibm1025, ibm-1123, ibm-1153, ibm-1158, 
ebcdic-cp-roece, ibm870, cp1158, cp1154, cp1153, cp1123, csibm1158, csibm1153, 
ibm1153, ibm1154, ibm1158, ibm-4971, ebcdic-cp-yu, csibm1123, cp4971, cp1025, 
csibm1166, ibm-1025, csibm1025, csibm1154, ibm1166, ibm1123, cp870, cp1166, 
ibm-1154, csibm4971, ibm-1166, ibm4971, osf10020366, csibm870
 !  [  ]  | 
4F 4A 5A 6A 
5A AD BD 4F 

Unusual mapping of ASCII for ibm1148, ibm500, ebcdic-int1, ibm1164, csibm1164, 
ebcdic-cp-ch, ibm-1148, ibm-1130, 500v1, cp1148, osf100201f4, 500, ibm256, 
cp1084, cp1164, ibm1130, cp500, csibm500, ebcdic-cp-be, csibm1130, ibm-1164, 
cp1130, csibm1148
 !  [  ]  | 
4F 4A 5A BB 
5A AD BD 4F 

Unusual mapping of ASCII for osf10020396, ibm918, csibm918, cp918, ebcdic-cp-ar2
 !  [  ]  `  | 
4F 4A 5A 6A BB 
5A AD BD 79 4F 

Unusual mapping of ASCII for ibm937, ibm-937, cp937, csibm937
0e 0f  [  ]  ^ 
3F 3F BA BB B0 
0E 0F AD BD 5F 

Unusual mapping of ASCII for ibm939, ibm-939, cp939, csibm939
0e 0f  \  ^  ~ 
3F 3F B2 B0 A0 
0E 0F E0 5F A1 

Unusual mapping of ASCII for ibm1146, cp285, csibm285, ibm-1146, cp1146, 
ibm285, ebcdic-cp-gb, osf1002011d, csibm1146
 $  [  ]  ^  ~ 
4A B1 BB BA BC 
5B AD BD 5F A1 

Unusual mapping of ASCII for ibm1145, cp284, osf1002011c, csibm284, ibm-1145, 
cp1145, ibm284, cp1079, ebcdic-cp-es, csibm1145
 !  #  [  ]  ^  ~ 
BB 69 4A 5A BA BD 
5A 7B AD BD 5F A1 

Unusual mapping of ASCII for ibm933, ibm-933, cp933, ibm-1364, csibm1364, 
ibm1364, cp1364, csibm933
0e 0f  [  \  ]  ^ 
3F 3F 70 B2 80 B0 
0E 0F AD E0 BD 5F 

Unusual mapping of ASCII for ibm-1371, cp1371, csibm1371, ibm1371
0e 0f  [  \  ]  ^ 
3F 3F BA 5B BB B0 
0E 0F AD E0 BD 5F 

Unusual mapping of ASCII for ibm935, ibm-935, ibm-1388, cp935, csibm1388, 
cp1388, ibm1388, csibm935
0e 0f  $  [  \  ]  ^  ~ 
3F 3F E0 BA B2 BB B0 A0 
0E 0F 5B AD E0 BD 5F A1 

Unusual mapping of ASCII for ibm1141, ibm273, csibm273, ibm-1141, cp1141, 
cp273, osf10020111, csibm1141
 !  @  [  \  ]  {  |  }  ~ 
4F B5 63 EC FC 43 BB DC 59 
5A 7C AD E0 BD C0 4F D0 A1 

Unusual mapping of ASCII for ibm1142, osf10020115, ibm277, csibm277, 
ebcdic-cp-no, ebcdic-cp-dk, cp1142, ibm-1142, csibm1142
 !  #  $  @  [  ]  {  |  }  ~ 
4F 4A 67 80 9E 9F 9C BB 47 DC 
5A 7B 5B 7C AD BD C0 4F D0 A1 

Unusual mapping of ASCII for ibm278, csibm278, osf10020116, cp278, 
ebcdic-cp-fi, ebcdic-cp-se
 !  #  $  @  [  ]  `  {  |  }  ~ 
4F 63 67 EC B5 9F 51 43 BB 47 DC 
5A 7B 5B 7C AD BD 79 C0 4F D0 A1 

Unusual mapping of ASCII for ibm1144, cp280, ibm-1144, cp1144, ibm280, 
csibm280, ebcdic-cp-it, osf10020118, csibm1144
 !  #  @  [  \  ]  `  {  |  }  ~ 
4F B1 B5 90 48 51 DD 44 BB 54 58 
5A 7B 7C AD E0 BD 79 C0 4F D0 A1 

Unusual mapping of ASCII for ibm1147, cp297, ibm-1147, osf10020129, cp1147, 
cp1081, csibm297, ibm297, ebcdic-cp-fr, csibm1147
 !  #  @  [  \  ]  `  {  |  }  ~ 
4F B1 44 90 48 B5 A0 51 BB 54 BD 
5A 7B 7C AD E0 BD 79 C0 4F D0 A1 

Unusual mapping of ASCII for ibm1149, ibm-1149, cp1149, cp871, ibm871, 
ebcdic-cp-is, csibm871, osf10020367, csibm1149
 !  @  [  \  ]  ^  `  {  |  }  ~ 
4F AC AE BE 9E EC 8C 8E BB 9C CC 
5A 7C AD E0 BD 5F 79 C0 4F D0 A1 

Unusual mapping of ASCII for ibm1143, ibm-1122, ibm-1157, cp1157, cp1122, 
csibm1122, ibm1157, ibm-1143, cp1143, csibm1157, ibm1122, csibm1143
 !  #  $  @  [  \  ]  `  {  |  }  ~ 
4F 63 67 EC B5 71 9F 51 43 BB 47 DC 
5A 7B 5B 7C AD E0 BD 79 C0 4F D0 A1 

Unusual mapping of ASCII for ibm1026, ibm-1155, cp1155, ibm1155, cp1026, 
csibm1026, csibm1155, 1026, osf10020402
 !  "  #  $  @  [  \  ]  `  {  |  }  ~ 
4F FC EC AD AE 68 DC AC 8D 48 BB 8C CC 
5A 7F 7B 5B 7C AD E0 BD 79 C0 4F D0 A1 

Unusual mapping of ASCII for ibm930, ibm-930, cp930, csibm930
0e 0f  $  [  \  ]  ^  a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  
t  u  v  w  x  y  z 
3F 3F E0 70 5B 80 B0 62 63 64 65 66 67 68 69 71 72 73 74 75 76 77 78 8B 9B AB 
B3 B4 B5 B6 B7 B8 B9 
0E 0F 5B AD E0 BD 5F 81 82 83 84 85 86 87 88 89 91 92 93 94 95 96 97 98 99 A2 
A3 A4 A5 A6 A7 A8 A9 

Unusual mapping of ASCII for cp1390, ibm-1390, csibm1390, ibm1390
0e 0f  $  [  \  ]  ^  a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  
t  u  v  w  x  y  z  ~ 
3F 3F E0 70 B2 80 B0 62 63 64 65 66 67 68 69 71 72 73 74 75 76 77 78 8B 9B AB 
B3 B4 B5 B6 B7 B8 B9 A0 
0E 0F 5B AD E0 BD 5F 81 82 83 84 85 86 87 88 89 91 92 93 94 95 96 97 98 99 A2 
A3 A4 A5 A6 A7 A8 A9 A1 

Unable to translate basic 128 ASCII characters in ibm16804, ibm1097, 
csebcdicatde, cp420, cp423, cp281, ibm880, ebcdic-fr, ibm281, csibm880, 
csebcdicdkno, ebcdic-us, csibm4517, ibm803, ebcdicdknoa, ebcdic-br, csibm038, 
ebcdic-be, ibm274, osf10020370, csebcdices, csibm275, csibm274, ebcdic-ca-fr, 
ebcdic-is-friss, ebcdic-dk-no, csebcdicfise, csebcdicdknoa, cp803, csibm16804, 
ibm038, ibm-803, ebcdices, ibm275, csebcdicit, ibm4517, csebcdicpt, 
ebcdicisfriss, csebcdicess, csibm281, csebcdicesa, ebcdicess, csebcdicfr, 
osf10020122, ebcdic-es-a, ebcdic-es-s, ebcdicesa, ebcdicpt, ebcdic-es, 
ebcdic-dk-no-a, ebcdic-cp-gr, ebcdiccafr, ebcdicfisea, ebcdicit, ibm4899, 
ebcdic-jp-e, ibm-16804, cp275, cp274, csibm423, csibm420, ibm-4899, cp4517, 
ebcdic-at-de, csibm4899, cp875, ibm875, ebcdic-pt, ebcdicfr, csibm290, 
ebcdicfise, csebcdicfisea, cp290, ebcdic-it, ibm423, ibm420, ibm290, 
csebcdiccafr, ebcdic-uk, ebcdic-at-de-a, cp16804, ebcdic-fi-se-a, cp4899, 
csebcdicuk, csibm803, csebcdicus, osf100201a4, ebcdic-jp-kana, ibm-1097, 
ebcdicuk, ebcdicus, csebcdicatdea, ebcdicatde, ebcdic-greek, ebcdic-fi-se, 
ebcdicatdea, csibm1097, ebcdic-int, ebcdic-cyrillic, ebcdic-cp-ar1, ebcdicdkno, 
ibm-4517, osf1002036b, cp880, cp1097, cp038


#!/usr/bin/python
#
# Relies on https://pypi.python.org/pypi/iconv_codecs 0.2a1 or later

import iconv_codecs

clst = iconv_codecs.get_supported_codecs()

nstr = ''.join(list(map(unichr,range(0,127)))).encode('iconv:ibm-1047')

cleanmaps = list()
failedmaps = list()
obscuremaps = list()

obscureelts = list()

for cod in clst:
  try:
    if 'A'.encode('iconv:' + cod) == '\xC1':
      try:
        cstr = ''.join(list(map(unichr,range(0,127)))).encode('iconv:' + cod)
        if cstr == nstr:
          cleanmaps.append(cod)
        else:
          try:
            obscuremaps[obscureelts.index(cstr)].append(cod)
          except:
            obscuremaps.append([ cod ])
            obscureelts.append(cstr)
      except:
        failedmaps.append(cod)
  except:
    pass

print 'Acceptible mapping of ASCII for ' + ', '.join(cleanmaps)

for l in range(0, len(obscuremaps) - 1):
          print 'Unusual mapping of ASCII for ' + ', '.join(obscuremaps[l])
          cstr = obscureelts[l]
          clst = cdiff = ndiff = '';
          for i in range(0, 127):
            if cstr[i] != nstr[i]: 
              if (i >= 32 and i < 127):
                clst += ' ' + unichr(i) + ' '
              else:
                ch = unichr(10).encode('unicode-escape') + ' '
                if len(ch) == 2:
                  clst += ch
                else:
                  clst += "%02x " % i
              if i < len(cstr):
                cdiff += "%02X " % ord(cstr[i])
              else:
                cdiff += '   '
              if i < len(nstr):
                ndiff += "%02X " % ord(nstr[i])
              else:
                ndiff += '   '
          print clst
          print cdiff
          print ndiff

print 'Unable to translate basic 128 ASCII characters in ' + ', '.join(failedmaps)

Reply via email to