Side note: you could get around some of the problems, below, but in order to
do so, you would have to exhaustively express all of Unicode using the Str
builtin module's RANGES constant. In fact, as it is now, it defines ASCII
lowercase, but doesn't define Latin lowercase. Presumably because doing so
would be a massive pain. Again, I'll point out that using script and
properties is much easier....

On Tue, Jul 20, 2010 at 10:35 PM, Solomon Foster <colo...@gmail.com> wrote:

>
> Sorry, didn't mean to imply the series operator was perfect.  (Though
> it is surprisingly awesome in  general, IMO.)  Just that the right
> questions would be about the series operator rather than Ranges.
>

So, what's the intention of the range operator, then? Is it just there to
offer backward compatibility with Perl 5? Is it a vestige that should be
removed so that we can Huffman ... down to ..?

I'm not trying to be difficult, here, I just never knew that ... could
operate on a single item as LHS, and if it can, then .. seems to be obsolete
and holding some prime operator real estate.


>
> The questions definitely look different that way: for example,
> ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz is easily and
> clearly expressed as
>
>    'A' ... 'Z', 'a' ... 'z'     # don't think this works in Rakudo yet  :(
>

I still contend that this is so frequently desirable that it should have a
simpler form, but it's still going to have problems.

One example: for expressing "Katakana letters" (I use "letters" in the
Unicode sense, here) it's still dicey. There are things interspersed in the
Unicode sequence for Katakana that aren't the same thing at all. Unicode
calls them lowercase, but that's not quite right. They're smaller versions
of Katakana characters which are used more as punctuation or accents than as
syllabic glyphs the way the rest of Katakana is.

I guess you could write:

  ア, イ, ウ, エ, オ, カ ... ヂ,ツ ...モ,ヤ, ユ, ヨ ... ロ, ワ ... ヴ (add quotes to taste)

But that seems quite a bit more painful than:

 ア .. ヴ (or ... if you prefer)

Similar problems exist for many scripts (including some of Latin, we're just
used to the parts that are odd), though I think it's possible that Katakana
may be the worst because of the mis-use of Ll to indicate a letter when the
truth of the matter is far more complicated.



> That suggests to me that the current behavior of 'A' ... 'z' is pretty
> reasonable.
>

You still have to decide to make at least some allowances for invalid
codepoints and I think you should avoid ever generating a combining or
modifying codepoint in such a sequence (e.g. "Ѻ" ... "Ҋ" in Cyrillic which
contains several combining characters for currency and counting as well as
one undefined codepoint).

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs

Reply via email to