AW: Pod::Simple can treat binary as pod due to liberal/inconsistent regexp patterns
Right. Now I remember old threads where people would argue that POD parsers should do exactly the same as the Perl parser - and IIRC the conclusion was that using something like PPI to handle pathological cases like multiline strings or here docs would be an overkill, so POD is what starts with a (valid) POD directive. The only thing that perhaps could be changed is to skip a __DATA__ section (but keep parsing since there may be POD behind __END__ !) I see a potential way of resolving this, but it looks like quite a big effort: the Perl parser could store all text it skips as POD in a similar structure like __DATA__ so that POD parsing utilities could use a pseudo-filehandle like this (reading POD for the current script): while (main::__POD__) { ... } and for other files there could be a special open() discipline to return only POD using the same parser. What do you think? -Marek Von meinem Samsung Galaxy Smartphone gesendet. Ursprüngliche Nachricht Von: Randy Stauner rwstau...@cpan.org Datum:08.01.2015 19:26 (GMT+01:00) An: David E. Wheeler da...@justatheory.com Cc: Marek Rouchal ma...@rouchal.net, Karl Williamson pub...@khwilliamson.com, pod-people@perl.org Betreff: Re: Pod::Simple can treat binary as pod due to liberal/inconsistent regexp patterns
Re: Assume CP1252
* Grant McLean gr...@mclean.net.nz [2015-01-07T18:47:49] I also agree this is a good idea. None of the Latin-1 control characters that CP1252 replaces with printable characters should be appearing in POD anyway. Seems safe, I think. At first, I thought, They're disjunct!! but then I realized that this is only true on codepoints that nobody is going to use in their Latin-1 POD. -- rjbs signature.asc Description: Digital signature
Re: Pod::Simple can treat binary as pod due to liberal/inconsistent regexp patterns
* David E. Wheeler da...@justatheory.com [2015-01-08T00:38:04] I agree that’s too liberal. I suggest /\A=([a-zA-Z]+\d*)\b/ trolling? Surely you want [0-9] instead of \d, lest we end up with =head୩ ! /trolling? -- rjbs signature.asc Description: Digital signature
Re: Pod::Simple can treat binary as pod due to liberal/inconsistent regexp patterns
IIRC the first liberal rx is to detect start of POD just like the Perl (language) parser does, i.e. it pauses parsing for instructions until the next =cut I think POD parsers should do the same. If the matched pod-start sequence does not match any of the known commands, it's an error condition, and we should discuss what to do then, like - throw exception - print error and/or call error callback - warn and treat the content as a plain text paragraph -Marek Von meinem Samsung Galaxy Smartphone gesendet. Ursprüngliche Nachricht Von: David E. Wheeler da...@justatheory.com Datum:08.01.2015 06:39 (GMT+01:00) An: Karl Williamson pub...@khwilliamson.com Cc: Randy Stauner rwstau...@cpan.org, pod-people@perl.org Betreff: Re: Pod::Simple can treat binary as pod due to liberal/inconsistent regexp patterns
Re: Pod::Simple can treat binary as pod due to liberal/inconsistent regexp patterns
On Jan 7, 2015, at 10:18 PM, Marek Rouchal ma...@rouchal.net wrote: IIRC the first liberal rx is to detect start of POD just like the Perl (language) parser does, i.e. it pauses parsing for instructions until the next =cut Oh. Can someone dig into the Perl parser and confirm this? I think POD parsers should do the same. My suspicion is that, even if that’s true, the Parser ignores everything in a __DATA__ or __END__ block. Anyway, even if Perl is more lenient, that doesn’t mean a Pod parser needs to be. What is and is not valid Pod is quite well-defined in perlpodspec, so I suspect taht we can afford to be a bit stricter. If the matched pod-start sequence does not match any of the known commands, it's an error condition, and we should discuss what to do then, like - throw exception - print error and/or call error callback - warn and treat the content as a plain text paragraph It might be valid Perl. my $foo = q{ =sîî }; So I think it would be better just to be stricter in what we consider to be Pod. Best, David smime.p7s Description: S/MIME cryptographic signature
Allow Whitespace in L URLs?
Poders, RT #93491 (https://rt.cpan.org/Ticket/Display.html?id=93491) reports that URLs are mis-formatted when they contain a newline. Turns out, the regex that detects a URL in L explicitly forbids whitespace: next unless $ell-[$_] =~ m/^(?:([^|]*)\|)?(\w+:[^:\s]\S*)$/s; https://github.com/theory/pod-simple/blob/master/lib/Pod/Simple.pm#L1102 I think that is probably sane, but maybe there are other opinions? Should we allow whitespace in L URLs? If so, I think we would just change \S to . Thoughts? David smime.p7s Description: S/MIME cryptographic signature
Re: Pod::Simple can treat binary as pod due to liberal/inconsistent regexp patterns
On 01/08/2015 11:17 AM, Randy Stauner wrote: IIRC the first liberal rx is to detect start of POD just like the Perl (language) parser does, i.e. it pauses parsing for instructions until the next =cut Oh. Can someone dig into the Perl parser and confirm this? I think POD parsers should do the same. My suspicion is that, even if that’s true, the Parser ignores everything in a __DATA__ or __END__ block. Here is an example I worked up when writing test for metacpan: Everything after __DATA__ is data, but the pod parser will also find pod if it's there https://gist.github.com/rwstauner/98f97e6cd64c972d9b71 I don't understand the parser very well, but if someone wants a crack at it, here is the only portion of it that sets to being in pod. The context is that the first character on the line is an =, and tmp holds the character that follows that =. I think 's' points to the input starting at tmp, so that tmp == *s: if (PL_expect == XSTATE isALPHA(tmp) (s == PL_linestart+1 || s[-2] == '\n') ) { if ((PL_in_eval !PL_rsfp !PL_parser-filtered) || PL_lex_state != LEX_NORMAL) { d = PL_bufend; while (s d) { if (*s++ == '\n') { incline(s); if (strnEQ(s,=cut,4)) { s = strchr(s,'\n'); if (s) s++; else s = d; incline(s); goto retry; } } } goto retry; } s = PL_bufend; PL_parser-in_pod = 1; goto retry; }
Re: Allow Whitespace in L URLs?
* David E. Wheeler da...@justatheory.com [2015-01-08T13:42:10] I think that is probably sane, but maybe there are other opinions? Should we allow whitespace in L URLs? If so, I think we would just change \S to . I didn't scrutinize the regexp (which is present in perlpodspec) closely, but URLs may not contain unescape spaces, so I think there's no reason to allow it. Lfoo bar|http://baz-barshould be okay Lfoo bar|http://baz barshould not Lfoo bar | http://baz-bar unclear from quick skim of spec I assume the second case is what came up. It's not a valid URI, by my reading of https://tools.ietf.org/html/rfc3986#appendix-A -- rjbs signature.asc Description: Digital signature
Re: Allow Whitespace in L URLs?
On Jan 8, 2015, at 1:00 PM, Ricardo Signes perl@rjbs.manxome.org wrote: I didn't scrutinize the regexp (which is present in perlpodspec) closely, but URLs may not contain unescape spaces, so I think there's no reason to allow it. Lfoo bar|http://baz-barshould be okay Lfoo bar|http://baz barshould not Lfoo bar | http://baz-bar unclear from quick skim of spec I assume the second case is what came up. It's not a valid URI, by my reading of https://tools.ietf.org/html/rfc3986#appendix-A IIUC, the case that came up was Lfoo bar|http://baz.com/foo bar I am kind of inclined to just say that such things are verboten. However, ticket 95710 offers up this example: LDEMO with NL| /DEMO and trailing text It’s not just URLs that we need to decide how to deal with, I guess. David smime.p7s Description: S/MIME cryptographic signature
Re: Allow Whitespace in L URLs?
On Thu, 8 Jan 2015 10:42:10 -0800 David E. Wheeler da...@justatheory.com wrote: I think that is probably sane, but maybe there are other opinions? Should we allow whitespace in L URLs? URLs use + or %20 for spaces. There is no need for whitespace in a URL. -- Don't stop where the ink does. Shawn