AW: Pod::Simple can treat binary as pod due to liberal/inconsistent regexp patterns

2015-01-08 Thread Marek Rouchal
Right. Now I remember old threads where people would argue that POD parsers 
should do exactly the same as the Perl parser - and IIRC the conclusion was 
that using something like PPI to handle pathological cases like multiline 
strings or here docs would be an overkill, so POD is what starts with a (valid) 
POD directive. The only thing that perhaps could be changed is to skip a 
__DATA__ section (but keep parsing since there may be POD behind __END__ !)

I see a potential way of resolving this, but it looks like quite a big effort: 
the Perl parser could store all text it skips as POD in a similar structure 
like __DATA__ so that POD parsing utilities could use a pseudo-filehandle like 
this (reading POD for the current script):

while (main::__POD__) {
...
}

and for other files there could be a special open() discipline to return only 
POD using the same parser. What do you think?

-Marek


Von meinem Samsung Galaxy Smartphone gesendet.


 Ursprüngliche Nachricht 
Von: Randy Stauner rwstau...@cpan.org 
Datum:08.01.2015  19:26  (GMT+01:00) 
An: David E. Wheeler da...@justatheory.com 
Cc: Marek Rouchal ma...@rouchal.net, Karl Williamson 
pub...@khwilliamson.com, pod-people@perl.org 
Betreff: Re: Pod::Simple can treat binary as pod due to liberal/inconsistent 
regexp patterns 



Re: Assume CP1252

2015-01-08 Thread Ricardo Signes
* Grant McLean gr...@mclean.net.nz [2015-01-07T18:47:49]
 I also agree this is a good idea.  None of the Latin-1 control
 characters that CP1252 replaces with printable characters should be
 appearing in POD anyway.

Seems safe, I think.  At first, I thought, They're disjunct!! but then I
realized that this is only true on codepoints that nobody is going to use in
their Latin-1 POD.

-- 
rjbs


signature.asc
Description: Digital signature


Re: Pod::Simple can treat binary as pod due to liberal/inconsistent regexp patterns

2015-01-08 Thread Ricardo Signes
* David E. Wheeler da...@justatheory.com [2015-01-08T00:38:04]
 I agree that’s too liberal. I suggest
 
 /\A=([a-zA-Z]+\d*)\b/

trolling?
Surely you want [0-9] instead of \d, lest we end up with =head୩ !
/trolling?

-- 
rjbs


signature.asc
Description: Digital signature


Re: Pod::Simple can treat binary as pod due to liberal/inconsistent regexp patterns

2015-01-08 Thread Marek Rouchal
IIRC the first liberal rx is to detect start of POD just like the Perl 
(language) parser does, i.e. it pauses parsing for instructions until the next 
=cut
I think POD parsers should do the same. If the matched pod-start sequence does 
not match any of the known commands, it's an error condition, and we should 
discuss what to do then, like 
- throw exception 
- print error and/or call error callback
- warn and treat the content as a plain text paragraph

-Marek


Von meinem Samsung Galaxy Smartphone gesendet.


 Ursprüngliche Nachricht 
Von: David E. Wheeler da...@justatheory.com 
Datum:08.01.2015  06:39  (GMT+01:00) 
An: Karl Williamson pub...@khwilliamson.com 
Cc: Randy Stauner rwstau...@cpan.org, pod-people@perl.org 
Betreff: Re: Pod::Simple can treat binary as pod due to liberal/inconsistent 
regexp patterns 



Re: Pod::Simple can treat binary as pod due to liberal/inconsistent regexp patterns

2015-01-08 Thread David E. Wheeler
On Jan 7, 2015, at 10:18 PM, Marek Rouchal ma...@rouchal.net wrote:

 IIRC the first liberal rx is to detect start of POD just like the Perl 
 (language) parser does, i.e. it pauses parsing for instructions until the 
 next =cut

Oh. Can someone dig into the Perl parser and confirm this?

 I think POD parsers should do the same.

My suspicion is that, even if that’s true, the Parser ignores everything in a 
__DATA__ or __END__ block.

Anyway, even if Perl is more lenient, that doesn’t mean a Pod parser needs to 
be. What is and is not valid Pod is quite well-defined in perlpodspec, so I 
suspect taht we can afford to be a bit stricter.

 If the matched pod-start sequence does not match any of the known commands, 
 it's an error condition, and we should discuss what to do then, like 
 - throw exception 
 - print error and/or call error callback
 - warn and treat the content as a plain text paragraph

It might be valid Perl.

my $foo = q{
=sîî
};

So I think it would be better just to be stricter in what we consider to be Pod.

Best,

David



smime.p7s
Description: S/MIME cryptographic signature


Allow Whitespace in L URLs?

2015-01-08 Thread David E. Wheeler
Poders,

RT #93491 (https://rt.cpan.org/Ticket/Display.html?id=93491) reports that URLs 
are mis-formatted when they contain a newline. Turns out, the regex that 
detects a URL in L explicitly forbids whitespace:

next unless $ell-[$_] =~ m/^(?:([^|]*)\|)?(\w+:[^:\s]\S*)$/s;

  https://github.com/theory/pod-simple/blob/master/lib/Pod/Simple.pm#L1102

I think that is probably sane, but maybe there are other opinions? Should we 
allow whitespace in L URLs? If so, I think we would just change \S to .

Thoughts?

David



smime.p7s
Description: S/MIME cryptographic signature


Re: Pod::Simple can treat binary as pod due to liberal/inconsistent regexp patterns

2015-01-08 Thread Karl Williamson

On 01/08/2015 11:17 AM, Randy Stauner wrote:

 IIRC the first liberal rx is to detect start of POD just like the Perl 
(language) parser does, i.e. it pauses parsing for instructions until the next =cut

Oh. Can someone dig into the Perl parser and confirm this?

 I think POD parsers should do the same.

My suspicion is that, even if that’s true, the Parser ignores
everything in a __DATA__ or __END__ block.


Here is an example I worked up when writing test for metacpan:
Everything after __DATA__ is data, but the pod parser will also find pod
if it's there
https://gist.github.com/rwstauner/98f97e6cd64c972d9b71



I don't understand the parser very well, but if someone wants a crack at 
it, here is the only portion of it that sets to being in pod.  The 
context is that the first character on the line is an =, and tmp holds 
the character that follows that =.  I think 's' points to the input 
starting at tmp, so that tmp == *s:


if (PL_expect == XSTATE  isALPHA(tmp) 
(s == PL_linestart+1 || s[-2] == '\n') )
{
if ((PL_in_eval  !PL_rsfp  !PL_parser-filtered)
|| PL_lex_state != LEX_NORMAL) {
d = PL_bufend;
while (s  d) {
if (*s++ == '\n') {
incline(s);
if (strnEQ(s,=cut,4)) {
s = strchr(s,'\n');
if (s)
s++;
else
s = d;
incline(s);
goto retry;
}
}
}
goto retry;
}
s = PL_bufend;
PL_parser-in_pod = 1;
goto retry;
}



Re: Allow Whitespace in L URLs?

2015-01-08 Thread Ricardo Signes
* David E. Wheeler da...@justatheory.com [2015-01-08T13:42:10]
 I think that is probably sane, but maybe there are other opinions? Should we
 allow whitespace in L URLs? If so, I think we would just change \S to .

I didn't scrutinize the regexp (which is present in perlpodspec) closely, but
URLs may not contain unescape spaces, so I think there's no reason to allow it.

  Lfoo bar|http://baz-barshould be okay
  Lfoo bar|http://baz barshould not
  Lfoo bar | http://baz-bar  unclear from quick skim of spec

I assume the second case is what came up.  It's not a valid URI, by my reading
of https://tools.ietf.org/html/rfc3986#appendix-A

-- 
rjbs


signature.asc
Description: Digital signature


Re: Allow Whitespace in L URLs?

2015-01-08 Thread David E. Wheeler
On Jan 8, 2015, at 1:00 PM, Ricardo Signes perl@rjbs.manxome.org wrote:

 I didn't scrutinize the regexp (which is present in perlpodspec) closely, but
 URLs may not contain unescape spaces, so I think there's no reason to allow 
 it.
 
  Lfoo bar|http://baz-barshould be okay
  Lfoo bar|http://baz barshould not
  Lfoo bar | http://baz-bar  unclear from quick skim of spec
 
 I assume the second case is what came up.  It's not a valid URI, by my reading
 of https://tools.ietf.org/html/rfc3986#appendix-A

IIUC, the case that came up was

Lfoo bar|http://baz.com/foo
bar

I am kind of inclined to just say that such things are verboten. However, 
ticket 95710 offers up this example:

LDEMO with NL|
/DEMO and trailing text

It’s not just URLs that we need to decide how to deal with, I guess.

David



smime.p7s
Description: S/MIME cryptographic signature


Re: Allow Whitespace in L URLs?

2015-01-08 Thread Shawn H Corey
On Thu, 8 Jan 2015 10:42:10 -0800
David E. Wheeler da...@justatheory.com wrote:

 I think that is probably sane, but maybe there are other opinions?
 Should we allow whitespace in L URLs?

URLs use + or %20 for spaces. There is no need for whitespace in a URL.


-- 
Don't stop where the ink does.
Shawn