Jeff Clites wrote:

> On Sep 23, 2004, at 5:27 PM, Edward Peschko wrote:
>
>> On Thu, Sep 23, 2004 at 08:15:08AM -0700, Jeff Clites wrote:
>>
>>>>
>>>> just like the transformation of a string into a number, and from a
>>>> number to a string. Two algorithmically different things as well,
>>>> but they'd damn-well better be exact inverses of the
>>>> other.
>>>
>>>
>>> But they're not:
>>>
>>> " 3 foo" --> 3 --> "3"
>>
>>
>> I'd say that that's a caveat of implementation, sort of a side effect
>> of handling
>> an error condition.
>
>
> Nope, I'd call it fundamental semantics--it allows common idioms such
> as "0 but true" in Perl5, for example. It's just an explicit part of
> the rule for how Perl (and C's strtol/atoi functions) assign numerical
> values to strings.
>

Actually, that raises a good point: Should "3 foo" convert to number 3,
or should it convert to C<3 but remainder(" foo")> ?

I can see wanting something like this for parsing, but I'm not sure if
this is the right way to get it.


> But you might like this example better, which I assume will work in
> Perl6:
>
> "3" --> 3 --> "3"
>
> (In case your email viewer doesn't render that, the first string
> contains the "fullwidth digit three", a distinct, wider version of a
> 3, used in some Asian languages.)

Is that true? That is, does fwd3 actually map to a 3, or is it a funny
character? (Doesn't russian have such a widget? I know that IPA uses
something that looks like a 3 but calls it a backwards E.)

Perhaps it returns C<3 but encoding("Unicode-FF00")>?

Likewise "\U0e53" --> 3 --> "3", but perhaps it should be annotated to
retranslate correctly -- thai digits are usually shown only when there
is a separate price for foreigners. We wouldn't want to reveal any
secrets... :)

>
>
>>>> My point is that if inputting strings into grammars is low level
>>>> enough to be an op, why isn't generating strings *from* grammars?
>>>
>>>
>>> Maybe, because it's a less common thing to want to do?
>>>
>> Well, there re two responses to the "that's not a common thing to
>> want to do":
>>
>> 1) its not a common thing to want to do because its not a useful
>> thing to do.
>> 2) its not a common thing to want to do because its too damn
>> difficult to do.
>>
>> I'd say that #2 is what holds. *Everybody* has difficulties with
>> regular expressions - about a quarter of my job is simply looking at
>> other people's regex used in data transformations and deciding what
>> small bug is causing them to fail given a certain input.
>
>
> Yeah, but when a regex isn't acting how I expected it to, I know that
> because I've already got in-hand an example of a string it matches
> which I thought it wouldn't, or one it fails to match which I thought
> it should. What I want to know is *why*--what part of the regex do I
> need to change. Generating strings which would have matched, wouldn't
> seem to help much.
>
> And you might be underestimating how many strings can be generated
> from even a simple regex, and how uninformative they could be. For
> example, the Perl5 regex /[a-z]{10}/ will match 141167095653376
> different strings, and it would likely be a very long time before I'd
> find out if this would match any strings starting with "x". I'd
> probably be left with the impression that it would only match strings
> starting with "aaaaa".
>

That's what lazy iterators/junctions are for.

If you ask perl to generate a regex, it gives you a lazy iterator.
Possibly one that is sitting beneath a junction.

On the one hand, /[a-z]{10}/ is equal to any("aaaaaaaaaa", ...).
On the other hand, it's equal to all("aaaaaaaaaa", ...). I'm not sure
which flavor is better, or if it's the act of ~~ or // -ing that
converts all() to any().

One place where this seems to be actually (instead of theoretically)
helpful is using grammars to generate. Perhaps the individual nodes in a
grammar could be overloaded with "but generate /variablename<digit>+/"
to eliminate noise, while still generating every grammatical permutation?

>> Running a regular expression in reverse has IMO the best potential
>> for making
>> regexes transparent - you graphically see how they work and what they
>> match.
>
>
> How graphically?


Run the generator, print the result. "Graphically" as in C<isgraph()>,
not "Graphically" as in "Internet pr0n". :)

>
>> Why shouldn't that be reflected in the language itself?
>
>
> Maybe because if it's likely to be used mostly for debugging, and can
> be implemented in a library, then it doesn't need to be implemented as
> an operator, and contribute to the general learning curve of the
> language's syntax.


On the other hand, we have C<x> (or is it C<xx> now?), a rudimentary
form of the operator you're discussing that only works for
/($lhs){$rhs}/ style patterns.

Perhaps the string version of unary postfix '*' could emit C<all("",
"$lhs", "$lhs$lhs", "$lhs$lhs$lhs", ...)>?

Reply via email to