multi-character ranges

Aaron Sherman Wed, 21 Jul 2010 08:38:46 -0700

[changing the subject because it's now clear we have two different
discussions on our hands. I think we're at or closing in on a consensus for
"a" .. "z", and this discussion is "aa" .. "bb"]

On Wed, Jul 21, 2010 at 1:56 AM, Darren Duncan <dar...@darrenduncan.net>wrote:

> Aaron Sherman wrote:
>
>> 2) The spec doesn't put this information anywhere near the definition of
>> the
>> range operator. Perhaps we can make a note? This was a source of confusion
>> for me.
>>
>
> My impression is that a "Range" primarily defines an "interval" in terms of
> 2 endpoint values such that it defines a possibly infinite set values
> between those endpoints.
>

I don't think that has much to do with the fact that it was quite reasonable
for me to look to the definition of ".." is S03 for what the range between
two characters contains.

3) It seems that there are two competing multi-character approaches and both
>> seem somewhat valid. Should we use a pragma to toggle behavior between A
>> and
>> B:
>>
>>  A: "aa" .. "bb" contains "az"
>>  B: "aa" .. "bb" contains ONLY "aa", "ab", "ba" and "bb"
>>
>
> I would find A to be the only reasonable answer.
>

[Before I respond, let's agree that, below, I'm going to say things like
"generates" when talking about "..". What I'm describing is the idea that a
value exists in the range given, not that a range is actually a list.]

I would find B to be the only reasonable answer, but enough people seem to
think the other way that I understand there's a valid need to be able to get
both behaviors.

> If you want B's semantics then use "..." instead; ".." should not be
> overloaded for that.
>

I wasn't really distinguishing between ".." and "..." as I'm pretty sure
they should have the same behavior, here. The case where I'm not sure they
should have the same behavior is "apple" .. "orange". Frankly, I think that
there's no right solution there. There's the one I proposed in my original
message (treat each character index as a distinct sequence and then
increment in a base defined by all of the sequences), but even I don't like
that. To generate all possible strings of length 5+ that sort between those
two is another suggestion, but then what do you expect "father-in-law" ..
"orange" to do? Punctuation throws a whole new dimension in there, and I'm
immediately lost. When you go to my Japanese example from many messages ago,
which I got from a fairly typical Web site and contained 2 Scripts with 4
different General Categories, I begin to need pharmaceuticals.

I don't see any value in having different rules for what .. and ... generate
in these cases, however. (frankly, I'm still on the fence about ... for
single endpoints, which I think should just devolve to .. (... with a list
for LHS is another animal, of course))

> If there were to be any similar pragma, then it should control matters like
> "collation", or what nationality/etc-specific subtype of Str the 'aa' and
> 'bb' are blessed into on definition, so that their collation/sorting/etc
> rules can be applied when figuring out if a particular $foo~~$bar..$baz is
> TRUE or not.
>

For inclusion (e.g. does "aaaaaa" .. "zzzzzz" generate "cliché") see the
single-character range discussion, which has already touched on locale
issues.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs

multi-character ranges

Reply via email to