Re: How are unrecognized options to built-in pod block types treated?

2010-08-05 Thread Aaron Sherman
On Wed, Aug 4, 2010 at 10:05 PM, Damian Conway dam...@conway.org wrote:

 Darren suggested:

  Use namespaces.

 The upper/lower/mixed approach *is* a
 namespace approach.


It's a very C-like approach, but yes, it's certainly a crude sort of
namespace. Perl already has a more robust and modern namespace system,
however. Using it would seem wise.



  Explicit versioning is your friend.
 
  Can I get some support for this?

 Not from me.  ;-)

 I think it's a dreadful prospect to allow people to
 write documentation that they will have to rewrite when
 the Pod spec gets updated.


I would hope... really, desperately hope that the POD spec changing would be
the least of anyone's worries. If you're writing documentation, it's a
foregone conclusion that it has to be maintained, just like any other part
of your software. If the POD spec is adding new config options at a rate
that isn't several orders of magnitude less than the frequency with which
your code changes then either you're documenting the Magna Carta or we have
a problem with our documentation system.

If the latter is the case, then the right solution is to provide new
documentation features via modules and allow the user to select which new
features they desire, automatically resolving the problem, since old docs
simply won't pull in newer features.

This could go both ways, as well. use v6 might get you the default
first-pressing documentation features of Perl 6.0.0 while use v6.1 might
get you the default features of 6.1. Then you could mix it up:

 use v6;
 use Docs::SectionImage;



 Or, alternatively, to require all
 Pod parsers to be infinitely backwards compatible across
 all versions. :-(


If you never want documentation to break, then that's your only option.
Someday we're going to decide to make an incompatible change to Perl's
documentation system, and we'll have a very good reason to do so, I'd
imagine. The right thing to do will be to make sure that we roll it out
carefully and with all due warning.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: How are unrecognized options to built-in pod block types treated?

2010-08-05 Thread Carl Mäsak
Darren ():
 Read what I said again.  I was proposing that the namespace comprised of
 names matching a pattern like this:

  /^ [A..Z]+ | [a..z]+ $/

/^ [[A..Z]+ | [a..z]+] $/

// Carl


pattern alternation (was Re: How are ...)

2010-08-05 Thread Darren Duncan

Carl Mäsak wrote:

Darren ():

Read what I said again.  I was proposing that the namespace comprised of
names matching a pattern like this:

 /^ [A..Z]+ | [a..z]+ $/


/^ [[A..Z]+ | [a..z]+] $/


Are the square brackets necessary when the pattern doesn't contain anything 
other than the alternatives?


I would have thought them optional in the case I mentioned.

Rather, they would just be necessary in a case like this:

  /^ foo [[A..Z]+ | [a..z]+] bar $/

-- Darren Duncan


declaring versions (was Re: How ...)

2010-08-05 Thread Darren Duncan

Damian Conway wrote:

Darren suggested:


Use namespaces.


The upper/lower/mixed approach *is* a
namespace approach.


Yes it is.  But I thought that prefix-namespaces would scale better.  Especially 
if the documentation system got complicated enough to involve modules, possibly 
those by different sources, as some others have suggested.


That said, I'm inclined to think that the likely complexity of the documentation 
system over time should grow by fewer orders of magnitude than code in general, 
and so I grant that some ideas can look like over-engineering.



Explicit versioning is your friend.

Can I get some support for this?


Not from me.  ;-)

I think it's a dreadful prospect to allow people to
write documentation that they will have to rewrite when
the Pod spec gets updated. Or, alternatively, to require all
Pod parsers to be infinitely backwards compatible across
all versions. :-(


One main purpose of declaring the intended interpretation context of a work is 
so that developers of interpreters have a lot more freedom to *not* be 
backwards-compatible.  Each version is effectively a separate language in some 
ways.  Because a work declares its language version, one should be able to take 
the work anywhere and it would be completely unambiguous as to how to interpret 
it, no matter how old it is and how much the state of the art has evolved.  If 
the meaning of a keyword changes in a spec, we know without a doubt which 
meaning the user intended.


As for backwards compatibility, this is actually less onerous to implement with 
my proposal than otherwise.


For one thing, if developers want to make an incompatible change, they can 
release it right away, without a long deprecation or changeover cycle, and in 
the typical case old works will continue to be interpreted correctly.


For another thing, assuming in the typical case that any time a language 
evolves, it still provides the means to accomplish anything it was previously 
capable of, then each implementation needs no backwards-compatibility 
internally, but just the state of the art.  Backwards compatibility can be 
achieved with version-specific shims over top of this single core, which 
translate works written to an older spec to their equivalent in the new one. 
Because versions are explicitly declared, it is trivial to dispatch to the 
correct interpreter or pseudo-interpreter.


Yet another thing, parsers don't have to be infinitely backwards compatible; 
they can deprecate support for particular older versions as they choose to, when 
necessary and reasonable.


So, explicit versioning is actually very good for *future-proofing*.

I believe there are various precedents for this.

In the Perl 5 world, for example, see autodie (optional) or perl5i 
(mandatory).

  use autodie qw(:1.994);

  use perl5i::2;

I have also done this from day one in my Muldis D language, and I have no 
regrets for doing so.


-- Darren Duncan


Re: declaring versions (was Re: How ...)

2010-08-05 Thread Darren Duncan

Darren Duncan wrote:
For another thing, assuming in the typical case that any time a language 
evolves, it still provides the means to accomplish anything it was 
previously capable of, then each implementation needs no 
backwards-compatibility internally, but just the state of the art.  
Backwards compatibility can be achieved with version-specific shims over 
top of this single core, which translate works written to an older spec 
to their equivalent in the new one. Because versions are explicitly 
declared, it is trivial to dispatch to the correct interpreter or 
pseudo-interpreter.


As an addendum to this thought ...

If a system is also capable of generating a source work from a parsed version 
that is effectively the same as the original, it should also be possible for a 
user to request a source translation from some older understood spec version to 
a newer/current one.  So they can be assisted in keeping their sources up to 
date without having to manually keep updating them, in general.  Then when 
support for older formats is deprecated and removed, by that time their source 
will have been updated so it is still interpretable without manual updates.


Of course, supporting this is optional, but its useful.

Like a Perl 5 to Perl 6 translator but on much finer and easier to do scales.

-- Darren Duncan


Re: pattern alternation (was Re: How are ...)

2010-08-05 Thread Patrick R. Michaud
On Thu, Aug 05, 2010 at 12:29:38AM -0700, Darren Duncan wrote:
 Carl Mäsak wrote:
 Darren ():
 Read what I said again.  I was proposing that the namespace comprised of
 names matching a pattern like this:
 
  /^ [A..Z]+ | [a..z]+ $/
 
 /^ [[A..Z]+ | [a..z]+] $/
 
 Are the square brackets necessary when the pattern doesn't contain
 anything other than the alternatives?

In this case yes -- the original pattern without the square brackets
would act like:

/ [^ [A..Z]+] | [[a..z]+ $] /

In other words, the original pattern says starting with uppercase
or ending with lowercase.

Pm



Re: pattern alternation (was Re: How are ...)

2010-08-05 Thread Carl Mäsak
Darren (), Carl (), Darren (), Patrick ():
 Read what I said again.  I was proposing that the namespace comprised of
 names matching a pattern like this:
 
  /^ [A..Z]+ | [a..z]+ $/
 
 /^ [[A..Z]+ | [a..z]+] $/

 Are the square brackets necessary when the pattern doesn't contain
 anything other than the alternatives?

 In this case yes -- the original pattern without the square brackets
 would act like:

    / [^ [A..Z]+] | [[a..z]+ $] /

 In other words, the original pattern says starting with uppercase
 or ending with lowercase.

I see this particular thinko a lot, though. Maybe some Perl 6 lint
tool or another will detect when you have a regex containing ^ at its
start, $ at the end, | somewhere in the middle, and no [] to
disambiguate.

// Carl


Re: pattern alternation (was Re: How are ...)

2010-08-05 Thread Aaron Sherman
On Thu, Aug 5, 2010 at 7:55 AM, Carl Mäsak cma...@gmail.com wrote:

 Darren (), Carl (), Darren (), Patrick ():

  In this case yes -- the original pattern without the square brackets
  would act like:
 
 / [^ [A..Z]+] | [[a..z]+ $] /
 
  In other words, the original pattern says starting with uppercase
  or ending with lowercase.

 I see this particular thinko a lot, though. Maybe some Perl 6 lint
 tool or another will detect when you have a regex containing ^ at its
 start, $ at the end, | somewhere in the middle, and no [] to
 disambiguate.



You know, this problem would go away, almost entirely, if we had a :f[ull]
adverb for regex matching that imposed ^[...]$ around the entire match. Then
your code becomes:

  m:f/[A..Z]+|[a..z]+/

for grins, :f[ull]l[ine] could use ^^ and $$.

I suspect :full would almost always be associated with TOP, in fact. Boy am
I tired of typing ^ and $ in TOP ;-)

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: pattern alternation (was Re: How are ...)

2010-08-05 Thread Jon Lang
Aaron Sherman wrote:
 You know, this problem would go away, almost entirely, if we had a :f[ull]
 adverb for regex matching that imposed ^[...]$ around the entire match. Then
 your code becomes:

  m:f/[A..Z]+|[a..z]+/

 for grins, :f[ull]l[ine] could use ^^ and $$.

 I suspect :full would almost always be associated with TOP, in fact. Boy am
 I tired of typing ^ and $ in TOP ;-)

The regex counterpart of C say $x  vs. C print $x\n .  Yes,
this would indeed solve a lot of problems.  It also reflects a
tendency in some regular expression engines out there to automatically
impose full string matching (i.e., an implicit ^ at the start and $ at
the end).

That said: for mnemonic purposes, I'd be inclined to have :f do
/^[$pattern]$/, while :ff does /^^[$pattern]$$/.

-- 
Jonathan Dataweaver Lang


Re: pattern alternation (was Re: How are ...)

2010-08-05 Thread David Green
On 2010-08-05, at 8:27 am, Aaron Sherman wrote:
 On Thu, Aug 5, 2010 at 7:55 AM, Carl Mäsak cma...@gmail.com wrote:
 
 I see this particular thinko a lot, though. Maybe some Perl 6 lint tool or 
 another will detect when you have a regex containing ^ at its start, $ at 
 the end, | somewhere in the middle, and no [] to disambiguate.

I think conceptually the beginning and the end of a string feels like a 
bracketing construct (only without symmetrical symbols).  At least that seems 
to be my instinct.  Well, it doesn't in / ^foo | ^bar | ^qux /, but in 
something like /^ foo|bar $/, the context immediately implies a higher 
precedence for ^ and $.  Maybe something like // foo|bar // could work as a 
bracketing version?

 You know, this problem would go away, almost entirely, if we had a :f[ull] 
 adverb for regex matching that imposed ^[...]$ around the entire match. 

I was thinking of that too.

 I suspect :full would almost always be associated with TOP, in fact. Boy am
 I tired of typing ^ and $ in TOP ;-)

Does it make sense for ^[...]$ to be assumed in TOP by default?  (Though not 
necessary if there's a shortcut like //...//.)


-David



Re: pattern alternation (was Re: How are ...)

2010-08-05 Thread Patrick R. Michaud
On Thu, Aug 05, 2010 at 10:27:50AM -0400, Aaron Sherman wrote:
 On Thu, Aug 5, 2010 at 7:55 AM, Carl Mäsak cma...@gmail.com wrote:
  I see this particular thinko a lot, though. Maybe some Perl 6 lint
  tool or another will detect when you have a regex containing ^ at its
  start, $ at the end, | somewhere in the middle, and no [] to
  disambiguate.
 
 You know, this problem would go away, almost entirely, if we had a :f[ull]
 adverb for regex matching that imposed ^[...]$ around the entire match. Then
 your code becomes:
 
   m:f/[A..Z]+|[a..z]+/

There's a version of this already.  Matching against an explicit 'regex', 
'token', or 'rule' automatically anchors it on both ends.  Thus:

$string ~~ regex { [A..Z]+ | [a..z]+ }

is equivalent to

$string ~~ regex { ^ [ A..Z+ | [a..z]+ ] $ }

Pm


Re: pattern alternation (was Re: How are ...)

2010-08-05 Thread Aaron Sherman
On Thu, Aug 5, 2010 at 11:09 AM, Patrick R. Michaud pmich...@pobox.comwrote:

 On Thu, Aug 05, 2010 at 10:27:50AM -0400, Aaron Sherman wrote:
  On Thu, Aug 5, 2010 at 7:55 AM, Carl Mäsak cma...@gmail.com wrote:
   I see this particular thinko a lot, though. Maybe some Perl 6 lint
   tool or another will detect when you have a regex containing ^ at its
   start, $ at the end, | somewhere in the middle, and no [] to
   disambiguate.
 
  You know, this problem would go away, almost entirely, if we had a
 :f[ull]
  adverb for regex matching that imposed ^[...]$ around the entire match.
 Then
  your code becomes:
 
m:f/[A..Z]+|[a..z]+/

 There's a version of this already.  Matching against an explicit 'regex',
 'token', or 'rule' automatically anchors it on both ends.  Thus:

$string ~~ regex { [A..Z]+ | [a..z]+ }

 is equivalent to

$string ~~ regex { ^ [ A..Z+ | [a..z]+ ] $ }


While that's a nifty special case (I'm sure it will surprise me someday, and
I'll spend a half hour debugging before I remember this mail), it doesn't
help in the general case (see my example grammar, below).

After doing some more thinking and comparing this to other languages
(python, for example has match which matches only at the start of a
string), it seems to me that there is a sort of out-of-band need to have a
more general solution at match time. Here's my second pass suggestion:

 m:r / m:rooted -- Match is rooted on both ends (^...$)
 m:rs / m:rootedstart - Match is rooted at the start of string (^, ala
Python re.match)
 m:re / m:rootedend - Match is rooted at the end of string ($)
 m:rn / m:rootednone - Match is not rooted (default)
 m:o / m:oneline - Modify :r and friends to use ^^/$$

Here's one way I can see that being routinely used:

 # Simplistic shell scripts
 rule TOP :r {stmt*} # Match the whole script
 rule stmt :r :o { cmd arg* } # One statement per line

The other way to go about that would be with parameterized adverbs. I'm not
sure how comfy people are with those, but they're in the spec. So this:

 m:r / m:rooted -- Match is rooted (default is ^...$)
Parameters:
:s / :start -- Match is rooted only at start (^)
:e / :end -- Match is rooted only at end ($)
[note: :s :e should produce a warning]
:n / :none -- Match is not rooted (null modifier)
[note: combining :n with :s or :e should warn]
:o / :oneline -- Use ^^ and $$ instead of ^ and $
[note: combining :o with :n should warn?]

So our statement matching grammar becomes:

 rule TOP :r {stmt*}
 rule stmt :r(:o) { cmd arg* }

The clown nose is just a side benefit ;-)

Seriously, though, I prefer :r(:o) because :r:o looks like it should be the
opposite of :rw (there is no :ro, as far as I know).

PS: I see no reason that any of this is needed for 6.0.0

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: pattern alternation (was Re: How are ...)

2010-08-05 Thread Tyler Curtis
On Thu, Aug 5, 2010 at 12:28 PM, Aaron Sherman a...@ajs.com wrote:
 While that's a nifty special case (I'm sure it will surprise me someday, and
 I'll spend a half hour debugging before I remember this mail), it doesn't
 help in the general case (see my example grammar, below).

In the general case, no. In the case of your grammar, and all
grammars, it does help.

All regex routines, when called standalone, are anchored to the
beginning and end of the string. So, having ^ and $ at the
beginning and end of your TOP is a no-op unless some other rule calls
it as a subrule.

S05 says: In general, the anchoring of any subrule call is controlled
by its calling context. When a regex, token, or rule method is called
as a subrule, the front is anchored to the current position (as with
:p), while the end is not anchored, since the calling context will
likely wish to continue parsing. However, when such a method is
smartmatched directly, it is automatically anchored on both ends to
the beginning and end of the string. and that The basic rule of
thumb is that the keyword-defined methods never do implicit .*?-like
scanning, while the m// and s// quotelike forms do such scanning in
the absence of explicit anchoring.

Given that the Grammar.parse is specified to create a new Grammar
object and directly match its TOP(or the value of the :rule adverb)
method, without any specification that it does implicit .*? like
scanning, I think that Grammar.parse should always anchor. This
doesn't appear to work quite properly in Rakudo currently. It anchors
to the beginning but not to the end. I'm about to check if there's a
rakudobug for this already, and submit it if not.

 After doing some more thinking and comparing this to other languages
 (python, for example has match which matches only at the start of a
 string), it seems to me that there is a sort of out-of-band need to have a
 more general solution at match time. Here's my second pass suggestion:

  m:r / m:rooted -- Match is rooted on both ends (^...$)
  m:rs / m:rootedstart - Match is rooted at the start of string (^, ala
 Python re.match)
  m:re / m:rootedend - Match is rooted at the end of string ($)
  m:rn / m:rootednone - Match is not rooted (default)
  m:o / m:oneline - Modify :r and friends to use ^^/$$

 Here's one way I can see that being routinely used:

  # Simplistic shell scripts
  rule TOP :r {stmt*} # Match the whole script
  rule stmt :r :o { cmd arg* } # One statement per line

:oneline or similar might be useful. I'm not sure about :rootedend and
:rootedstart. :rooted is useful only in one situation: when implicitly
matching against the topic. You could do m:r/ foo /; to match
against the topic, but regex { foo }; would not do what you want (I
think). I don't know if doing an anchored match against the topic is
really important enough to justify an adverb just so you don't have to
do $_ ~~ regex { foo }.


 The other way to go about that would be with parameterized adverbs. I'm not
 sure how comfy people are with those, but they're in the spec. So this:

  m:r / m:rooted -- Match is rooted (default is ^...$)
    Parameters:
    :s / :start -- Match is rooted only at start (^)
    :e / :end -- Match is rooted only at end ($)
    [note: :s :e should produce a warning]
    :n / :none -- Match is not rooted (null modifier)
    [note: combining :n with :s or :e should warn]
    :o / :oneline -- Use ^^ and $$ instead of ^ and $
    [note: combining :o with :n should warn?]

 So our statement matching grammar becomes:

  rule TOP :r {stmt*}
  rule stmt :r(:o) { cmd arg* }

 The clown nose is just a side benefit ;-)

 Seriously, though, I prefer :r(:o) because :r:o looks like it should be the
 opposite of :rw (there is no :ro, as far as I know).

 PS: I see no reason that any of this is needed for 6.0.0

 --
 Aaron Sherman
 Email or GTalk: a...@ajs.com
 http://www.ajs.com/~ajs




-- 
Tyler Curtis


Re: pattern alternation (was Re: How are ...)

2010-08-05 Thread Darren Duncan

David Green wrote:

On 2010-08-05, at 8:27 am, Aaron Sherman wrote:

On Thu, Aug 5, 2010 at 7:55 AM, Carl Mäsak cma...@gmail.com wrote:

I see this particular thinko a lot, though. Maybe some Perl 6 lint tool
or another will detect when you have a regex containing ^ at its start, $
at the end, | somewhere in the middle, and no [] to disambiguate.


I think conceptually the beginning and the end of a string feels like a
bracketing construct (only without symmetrical symbols).  At least that seems
to be my instinct.  Well, it doesn't in / ^foo | ^bar | ^qux /, but in
something like /^ foo|bar $/, the context immediately implies a higher
precedence for ^ and $.  Maybe something like // foo|bar // could work as a
bracketing version?


Personally, I had always considered the ^ and $ to be the lowest precedence 
things in a pattern.  But I can understand the flexibility one gains from that 
not being so, having seen David's example here, which it never occurred to me 
before was possible. -- Darren Duncan


Re: pattern alternation (was Re: How are ...)

2010-08-05 Thread Aaron Sherman
On Thu, Aug 5, 2010 at 2:43 PM, Tyler Curtis ekir...@gmail.com wrote:

 On Thu, Aug 5, 2010 at 12:28 PM, Aaron Sherman a...@ajs.com wrote:
  While that's a nifty special case (I'm sure it will surprise me someday,
 and
  I'll spend a half hour debugging before I remember this mail), it doesn't
  help in the general case (see my example grammar, below).

 In the general case, no. In the case of your grammar, and all
 grammars, it does help.

 All regex routines, when called standalone, are anchored to the
 beginning and end of the string. So, having ^ and $ at the
 beginning and end of your TOP is a no-op unless some other rule calls
 it as a subrule.


There's something deeply disturbing to me in that... but I can't fully
express what it is. It just feels like I'm going to end up debugging
mountains of code, written by people who didn't understand that that was the
case.

Several times over the past few weeks, I've mentioned something on this list
only to find that, buried somewhere deep in a synopsis, there was a special
case I was unaware of.

The sheer volume of silent special cases in Perl 6 appears to be dwarfing
that of Perl 5, but perhaps that's just because I know Perl 5 far better
than I know Perl 6.

Mind you, I'm not complaining, so much as working out how I feel out
loud Am I the only one who feels this way at this point?



 :oneline or similar might be useful. I'm not sure about :rootedend and
 :rootedstart.


Are you saying that you can't think of examples of where you want to root a
regex only to the start or end, or that you just don't think you need an
adverb to do it? If the former, then I submit the 1536 examples of matching
only at the end of strings in my local Perl library (mostly for matching
whitespace or filename extensions it looks like) and the 3199 examples of
matching only at the start which includes headers of all types (RFC2822 and
friends, HTTP, CPAN configs, etc.), whitespace, command sequence matching
(e.g. /^GET /) and so on.

If the latter, then I guess you and I just have a different take, here, and
that's fine. I respect your opinion, but in this case, I happen to disagree.

PS: You can also search through any typical python install for \.match
which will yield quite a lot of additional examples. I don't know Ruby or
Java very well, or I'd go looking for examples there too.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs