On 1/11/11 11:13 AM, Michel Fortin wrote:
On 2011-01-11 11:36:54 -0500, Andrei Alexandrescu
<[email protected]> said:

On 1/11/11 4:41 AM, Michel Fortin wrote:
For instance, say we have a conversion range taking a Unicode string and
converting it to ISO Latin 1. The best (lossy) conversion for "œ" is
"oe" (one chararacter to two characters), in this case 'front' could
simply return "oe" (two characters) in one iteration, with stepSize
being the size of the "œ" code point. In the same conversion process,
encountering "e" followed by a combining "´" would return pre-combined
character "é" (two characters to one character).

In the design as I thought of it, the effective length of one logical
element is one or more representation units. My understanding is that
you are referring to a fractional number of representation units for
one logical element.

Your understanding is correct.

I think both cases (one becomes many & many becomes one) are important
and must be supported. Your proposal only deal with the many-becomes-one
case.

I disagree. When I suggested this design I was worried of over-abstracting. Now this looks like abstracting for stuff that hasn't even been addressed concretely yet.

Besides, using bit as an encoding unit sounds like an acceptable approach for anything fractional.

I proposed returning arrays so we can deal with the one-becomes-many
case ("œ" becoming "oe"). Another idea would be to introduce "substeps".
When checking for the next character, in addition to determining its
step length you could also determine the number of substeps in it. "œ"
would have two substeps, "o" and "e", and when there is no longer any
substep you move to the next step.

All this said, I think this should stay an implementation detail as this
would allow a variety of strategies. Also, keeping this an
implementation detail means that your proposed 'stepSize' and
'backstepSize' need to be an implementation detail too (because they
won't make sense for the one-to-many case). So they can't really be part
of a standard VLE interface.

If you don't have at least stepSize that tells you how large the stride is to get to the next element, it becomes impossible to move within the range using integral indexes.

As far as I know, all we really need to expose to algorithms is whether
a range has elements of variable length, because this has an impact on
your indexing capabilities. The rest seems unnecessary to me, or am I
missing some use cases?

I think you could say that you don't really need stepSize because you can compute it as follows:

auto r1 = r;
r1.popFront();
size_t stepSize = r.length - r1.length;

This is tenuous, inefficient, and impossible if the support range doesn't support length (I realize that variable-length encodings work over other ranges than random access, but then again this may be an overgeneralization).


Andrei

Reply via email to