Why write macros? ii, comparison and ordering
One of the HLASM's notional eccentricities, shared with its
immediate predecessors, is that string comparisons consider first
length and only then code-point bit-string values. Agreeably, for
example,
'argon' < 'xenon',
but this is the case because and only because they are the same length
and bit-string comparison is thus controlling. On the other hand
'xenon' < 'helium'
because 'xenon' is the shorter of these two terms.
The question how to compare two simple strings of, in general,
different lengths to obtain traditional lexicographic- or
dictionary-sequence results is thus reducible to the question how to
pad the shorter of comparand and comparans to make their lengths
equal. There are two 'reasonable' choices of a padding character,
o a blank or space, ASCII x'20' or EBCDIC x'40', or
o a nul, ASCII and EBCDIC x'00'.
Padding can be explicit; or it can be implicit, built into the logic
of a machine instruction and sometimes parametric in the padding
character used.
Here nuls, instances of x'00', will be used for simple
padding-on-the-right; and such nuls are represented by the nonce
character 'µ', the Greek minuscule mu, in examples. After such
padding, for example,
'xenonµ' > 'helium'.
This indeed is enough to make these two strings comparable, and
when they are comparable/sortable I will say that they have been
normalized.
More is often required. Consider the external decimal
representations of some signed halfword binary integers
{-5, -00005, +-05, -0005, 83, -29005, +0}.
Now the range of such a signed halfword value H is of course just
-2^15 <= H <= +2^15 - 1,
-32768 <= H <= +32767.
Normalization rules for them are then immediate: 1) Numeric substrings
must be padded on the LEFT with zero characters to make them all five
characters in length, 2) prefixed multi-character sign substrings must
be reduced to a single, algebraically equivalent sign character, and
3) 'unsigned' values must be supplied with a default '+' sign.
Applying these rules to the set shown above yields the normalized
elements
-00005
-00005
-00005
+00083
-29005
+00000
These values are now immediately comparable and sortable, and an
important by-product of this comparability is that duplicate values
that had several different informal external representations can now
be recognized as such. (The elements of a set are by definition
unique, so that for a set of n elements an ordering
k(1) < k(2) < . . . < k(i) < . . . < k(n)
always obtains. An n-element multiset, for which a partial ordering
k(1) <= k(2) <= . . . <= k(i) <= . . . <= k(n)
always obtains, may contain duplicates, one or more subsequences of the form
k(j) = k(j+1) = . . . = k(j+d-1)
of d duplicate/equal elements. This difference is sometimes
important. Search schemes for sets and multisets must, for example,
be different.
A single further example of a normalization scheme must suffice
here. Consider the external, engineering-notation representations of
some HFP floating-point values,
0.0e0
+0.0e+000
+3.1415926E+00
55.5555555e-07
--7
Partial normalizations of the form
+0.000000000e+00
+0.000000000e+00
+3.141592600e+00
+5.555555555e-06
+7.000000000e+00
(for target single-precision HFP values) are some help; they permit
duplicates to be identified. Lexicographic comparisons that yield an
algebraic ordering are, however, only possible after a further
transformation that converts these values into analogues of their
internal (hardware) coded-arithmetic representations. Their exponents
are biased using a linear transformation that converts signed values
into always-positive ones. An HFP exponent e always falls in the open
interval
-64 <= e <= +64.
Adding 64 to an exponent value e yields a biased exponent value b in
the interval
0 <= b <= 128.
Moving a character-string representation of b that is padded on the
left with zero characters to make it three (sic) characters in length
then yields
+064e0.000000000
+064e0.000000000
+064e3.141592600
+058e5.555555555
+064e7.000000000
for which a lexicographic sort yields an algebraic ordering or partial
ordering or, better, would do so but for a small further complication
that is discussed immediately below.
This example cheats a little. When both '+' and '-' are present a
further complication arises. Consider some already normalized
external decimal representations of signed binary halfword values
-00005
-00003
-00018
+00122
+00008
+00001
Putting them into ascending lexicographic sequence yields
+00001
+00008
+00122
-00003
-00005
-00018
For both ASCII and EBCDIC
'-' > '+',
and the sequence of the negative elements needs to be exactly
reversed. Avoiding mathematosis and examining the obtained
lexicographic and required algebraic subscripts of these six elements,
we obtain
lexicographic algebraic
sequence sequence
1 4
2 5
3 6
4 3
5 2
6 1
For k a count of those of the n lexicographically sorted elements that
are negative, the transformation
· if e(i) >= 0 then A(i) = L(i) + k,
· else A(i) = n - L(i) + 1
converts a lexicographic subscript L(i) into an algebraic one A(i).
Legend hath it that sorting/ordering operations are difficult at
assembly time. In the HLASM's macro language they are, if not wholly
trivial, easy enough.
John Gilmore, Ashland, MA 01721 - USA