Re: Module Proposal: Parse::Reversible
has anyone mentioned xeger yet in this discussion?
Re: Module Proposal: Parse::Reversible
On Wed, 25 Apr 2007 13:44:11 -0500 David Nicol [EMAIL PROTECTED] wrote: has anyone mentioned xeger yet in this discussion? Not heard of that, no.. And it's unfortunately a little late now, behold: http://search.cpan.org/~pevans/String-MatchInterpolate-0.01/ An initial version :) I'm still inviting comment on its behaviour though... -- Paul LeoNerd Evans [EMAIL PROTECTED] ICQ# 4135350 | Registered Linux# 179460 http://www.leonerd.org.uk/ signature.asc Description: PGP signature
Re: Module Proposal: Parse::Reversible
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 25 Apr 2007, at 21:18, Paul LeoNerd Evans wrote: I'm still inviting comment on its behaviour though... I like it. After This regexp should not contain any capture brackets ( ) as these will confuse the parsing logic. I'd add Instead use non-capturing brackets (?: ) to group subparts of the regexp. just to avoid anyone cargo-culting the notion that you can't use brackets at all. - -- Andy Armstrong, hexten.net -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (Darwin) iD8DBQFGL8f+woknRJZQnCERAthbAKDNGsQwCcIv2H2qOdu75snV3bYreACglG59 H8XIHuSpLnyTiCKZG1lP4kc= =H1Md -END PGP SIGNATURE-
Re: Module Proposal: Parse::Reversible
On Fri, Apr 20, 2007 at 06:50:03PM +0100, Paul LeoNerd Evans wrote: package Parse::Reversable; I'm suddenly not so sure on the name any more... It's not just parsing, it's not just interpolation. It's both. To name it after one of these operations ignores the other. So I think somewhere under either Text:: or String:: might be better. What's the difference between these two roots? Why might one be favoured over the other? And what actual name? It pains me that my best attempt so far is String::ParsableInterpolable Surely we can do better than that? -- Paul LeoNerd Evans [EMAIL PROTECTED] ICQ# 4135350 | Registered Linux# 179460 http://www.leonerd.org.uk/ signature.asc Description: Digital signature
Re: Module Proposal: Parse::Reversible
On Tue, Apr 24, 2007 at 01:31:02PM +0100, Paul LeoNerd Evans wrote: String::ParsableInterpolable Surely we can do better than that? Actually, I'm not even sure on the parsable part now. Parsing would imply some sort of possibly-recursive, context-aware grammar system. This is much simpler - just literal strings with regexp-matched substrings in them. Perhaps more Matchable than Parsable. String::MatchInterpolate maybe? -- Paul LeoNerd Evans [EMAIL PROTECTED] ICQ# 4135350 | Registered Linux# 179460 http://www.leonerd.org.uk/ signature.asc Description: Digital signature
Re: Module Proposal: Parse::Reversible
Paul LeoNerd Evans wrote: No, I think at this point we have to appeal to the core reason for creating this module in the first place; namely, that it is bidirectional. Parsing a string into variables, or interpolating the variables back into a string. Both can be done within one object, symmetrically. To introduce something that breaks that symmetry effectively removes the requirement that it be done within one object, at which point one might as well use two separate objects for each individual operation. If the main objects your module will manipulate still are URIs, maybe it should be in the URI:: namespace. And couldn't the bidirectional relation you want to create be seen like a mapping? Hence URI::Mapper or something similar? -- Sébastien Aperghis-Tramoni Close the world, txEn eht nepO.
Re: Module Proposal: Parse::Reversible
On Tue, Apr 24, 2007 at 04:18:11PM +0200, Sébastien Aperghis-Tramoni wrote: If the main objects your module will manipulate still are URIs, maybe it should be in the URI:: namespace. And couldn't the bidirectional relation you want to create be seen like a mapping? Hence URI::Mapper or something similar? Not necessarily. I was trying to keep it generic. URL/URIs, file paths, ID strings,... any thing that has a simple string format which includes variables somehow embedded in it. LDAP DNs maybe? dn: cn=${NAME:\w+}, o=${UNIT:\w+}, dc=example, dc=com I wanted to leave it generic at a string level. -- Paul LeoNerd Evans [EMAIL PROTECTED] ICQ# 4135350 | Registered Linux# 179460 http://www.leonerd.org.uk/ signature.asc Description: Digital signature
Re: Module Proposal: Parse::Reversible
On Tue, Apr 24, 2007 at 04:01:10PM +0100, Andy Armstrong wrote: Text::Transform::Reversible ? Transform is too generic.. text goes in, other text goes out... That doesn't capture the essence of pattern matching (no pun intended :) ). -- Paul LeoNerd Evans [EMAIL PROTECTED] ICQ# 4135350 | Registered Linux# 179460 http://www.leonerd.org.uk/ signature.asc Description: Digital signature
Re: Module Proposal: Parse::Reversible
* Paul LeoNerd Evans [EMAIL PROTECTED] [2007-04-24 14:35]: It's not just parsing, it's not just interpolation. It's both. To name it after one of these operations ignores the other. So I think somewhere under either Text:: or String:: might be better. String::Template? Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: Module Proposal: Parse::Reversible
On 24 Apr 2007, at 18:07, A. Pagaltzis wrote: So I think somewhere under either Text:: or String:: might be better. String::Template? String::Template::Reversible maybe? String::Template sounds like a namespace rather than a module. -- Andy Armstrong, hexten.net
Re: Module Proposal: Parse::Reversible
* Andy Armstrong [EMAIL PROTECTED] [2007-04-24 19:15]: String::Template::Reversible maybe? String::Template sounds like a namespace rather than a module. I don’t know what it means for something to “sound like a namespace.” :-) Also, I think of templates as generally reversible anyway. Think of printf/scanf, strftime/strptime, URI::Template, etc. String templates often go both ways. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: Module Proposal: Parse::Reversible
On Tue, Apr 24, 2007 at 07:30:56PM +0200, A. Pagaltzis wrote: I don’t know what it means for something to “sound like a namespace.” :-) Also, I think of templates as generally reversible anyway. Think of printf/scanf, strftime/strptime, URI::Template, etc. String templates often go both ways. Well, most of the decent-sized template modules on CPAN aren't inherently reversible... Consider Template::Toolkit, Text::Template, HTML::Template, Mason, Also, String::Template sounds too much like Text::Template, which it isn't really.. It's a totally different idea. -- Paul LeoNerd Evans [EMAIL PROTECTED] ICQ# 4135350 | Registered Linux# 179460 http://www.leonerd.org.uk/ signature.asc Description: Digital signature
Re: Module Proposal: Parse::Reversible
* Paul LeoNerd Evans [EMAIL PROTECTED] [2007-04-24 19:40]: Also, String::Template sounds too much like Text::Template, which it isn't really.. It's a totally different idea. Not to me. “Text” to means a document (or some arbitrarily small unit of a document) that has meaning to a human. A “String” is a data type consisting of an ordered sequence of characters – an entity in terms of computing. Admittedly the difference may appear sort of hand-wavy, but to me, “Text::Template” and “String::Template” convey extremely different things. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: Module Proposal: Parse::Reversible
On Tue, April 24, 2007 11:05 am, Paul LeoNerd Evans said: On Tue, Apr 24, 2007 at 04:01:10PM +0100, Andy Armstrong wrote: Text::Transform::Reversible ? Transform is too generic.. text goes in, other text goes out... That doesn't capture the essence of pattern matching (no pun intended :) ). How about Text::Transform::ReversiblePattern ? Or even just Text::ReversiblePattern ? Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: Module Proposal: Parse::Reversible
On Tue, 24 Apr 2007 14:20:40 -0400 (EDT) Daniel T. Staal [EMAIL PROTECTED] wrote: How about Text::Transform::ReversiblePattern ? Or even just Text::ReversiblePattern ? Daniel T. Staal I am tempted by that, but I would prefer it in the String:: space; as A. Pagaltzis points out below; string is just a sequence of characters, which is what we have here, whereas text would imply some higher-level human meaning. -- Paul LeoNerd Evans [EMAIL PROTECTED] ICQ# 4135350 | Registered Linux# 179460 http://www.leonerd.org.uk/ signature.asc Description: PGP signature
Re: Module Proposal: Parse::Reversible
On Fri, 20 Apr 2007 21:26:33 +0100 Andy Armstrong [EMAIL PROTECTED] wrote: I agree that it's not massively general - but you could use it to * generate fixed width fields * truncate reals to ints * specify the number of decimal places On further thought, that all sounds very specific to numbers, and hard to generalise. It also breaks the symmetry of operations, as I have already noted. Also, someone else objects by private email that: Surely you've just reintroduced what your were trying to eliminate in the first place: separate languages for describing patterns and for describing formatting. This does indeed seem to be the case. If you wanted full formatting control, you might as well go with two separate strings; one for pattern capture, one for formatted interpolation. Given a sprintf format, it's easy to work out what the pattern for it would be, so can the pattern be optional? This is equally difficult in this direction. Consider for a moment a cut-down case of UK Postal Codes, which would match a regexp ${POSTCODE:[A-Z][A-Z][0-9] [0-9][A-Z][A-Z]} Were we instead to provide just a printf format for this, the only one that comes to mind is '%s'. Hard to say how we'd ever know to generate that specific pattern from there. No, I think at this point we have to appeal to the core reason for creating this module in the first place; namely, that it is bidirectional. Parsing a string into variables, or interpolating the variables back into a string. Both can be done within one object, symmetrically. To introduce something that breaks that symmetry effectively removes the requirement that it be done within one object, at which point one might as well use two separate objects for each individual operation. -- Paul LeoNerd Evans [EMAIL PROTECTED] ICQ# 4135350 | Registered Linux# 179460 http://www.leonerd.org.uk/ signature.asc Description: PGP signature
Re: Module Proposal: Parse::Reversible
* Paul LeoNerd Evans [EMAIL PROTECTED] [2007-04-21 15:00]: Theirs don't have the regexp matches like mine, but I guess that could always be done after the components have been pulled out. (Also the implementation is a bit less efficient, but that can be neatened up). I think of URIs as serialised structured objects, not just as strings; they even have an escaping mechanism. Running regexes against them to pluck out data is kinda like groping at HTML with patterns, just less likely to break because URIs are a much simpler serialisation format than markup languages. Putting intricate regexes, eg. for zipcodes, in the URI template is like putting zipcode validation in CGI.pm. It doesn’t belong there. The place for it is several layers down, in the model, which is environment agnostic and validates data the same way whether it comes from a web request or the command line or anywhere else. That’s how I see it. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/
Re: Module Proposal: Parse::Reversible
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 21 Apr 2007, at 13:57, Paul LeoNerd Evans wrote: Hmm.. Decisions... Do I change my code to use URI::Template, or release Parse::Reversible anyway, on the grounds that it does cover a slightly different area, even if in my case they're both usable.? I like the generality of Parse::Reversible. I've just done a similar thing for Perl::Version to enable it to modify version numbers but retain their formatting. I didn't have the sense to generalise it though. If you release your module I'd certainly consider modifying Perl::Version to use it. - -- Andy Armstrong, hexten.net -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (Darwin) iD8DBQFGKhxwwoknRJZQnCERAs1MAJ43l/EfS6lbfj4HaZKMZEA1hjnMvACgp2GP OeUD4jPS24xRvv4YWGmjNYk= =PO+4 -END PGP SIGNATURE-
Module Proposal: Parse::Reversible
The requirement for this module came about intially because I was thinking about how to handle virtual URLs in websites; for example: /photos/album12/photo17.jpg This will fetch the 17th photo from the 12th album, by whatever method internally is used. Internally, we need to know these values. Trying to make as generic a system as possible, I came up with the idea that somewhere in site config, would live a regexp-like pattern to explain how to parse that. This pattern needs to be reversible - the logic that generates pages has to be able to construct URLs that give paths to the files in question. The format I came upon would look like this: '/photos/album${ALBUM:\d+}/photo${PHOTO:\d+}.jpg' This pattern consists of literals, with variable interpolations embedded in it. Looks obvious from a string-generation point of view. Also regexp patterns are present, to explain what is valid in each position. This means, given this pattern, we can convert in either direction: /photos/album12/photo17.jpg gives: { ALBUM = 12, PHOTO = 17 } { ALUBM = 9, PHOTO = 15 } gives: /photos/album9/photo15.jpg The use-case here is that patterns come from some source such as a config file, being a fairly small fixed set which is known at the time the server is started. Incoming strings or sets of replacement values come from the running of the server, which is much more often. The implementation I have chosen, compiles the pattern into two CODE references, to allow efficient runtime usage, comparible to hand-coded regexps or variable interpolation. Also, no special considerations on the security of the patterns are made - it would be quite possible to embed arbitrary perl code within these patterns - the current implementation does not protect against this because the source of these patterns is assumed to be trusted. This would be noted in the documentation of this class. I've written an implementation of code, and a test script, by way of example for how it might be used. Find these attached. I'd appreciate some comments on this; specifically, if this functionallity would be useful enough to put on CPAN, or if it seems a quite specialised solution to a specific problem and not worth doing. -- Paul LeoNerd Evans [EMAIL PROTECTED] ICQ# 4135350 | Registered Linux# 179460 http://www.leonerd.org.uk/ package Parse::Reversable; use strict; use Carp; sub new { my $class = shift; my ( $pattern, %opts ) = @_; my $self = bless { pattern = $pattern, }, $class; my %vars; my $parsepattern = ; my $capturenumber = 1; my $parsebind = ; my @buildparts; # The buildsub closure will contain elements of this array in its # environment my @literals; my @components = split( m/(\$\{\w+:.*?\})/, $pattern, -1 ); foreach my $c ( @components ) { next if length( $c ) == 0; if( $c =~ m/^\$\{(\w+):(.*)\}$/ ) { my ( $var, $pattern ) = ( $1, $2 ); croak Multiple occurances of $var if exists $vars{$var}; $vars{$var} = 1; $parsepattern .= ($pattern); $parsebind .=\$var-{$var} = \$$capturenumber;\n; $capturenumber++; push @buildparts, \$var-{$var}; } else { $parsepattern .= quotemeta $c; push @literals, $c; push @buildparts, \$literals[$#literals]; } } if( $opts{allow_trail} ) { $parsepattern .= (.*?); $parsebind .=\$var-{_trail} = \$$capturenumber;\n; $capturenumber++; } my $parsecode = \$_[0] =~ m{^$parsepattern\$} or return undef; my \$var = {}; $parsebind \$var; ; $self-{parsesub} = eval sub { $parsecode }; my $buildcode = my ( \$var ) = [EMAIL PROTECTED]; . join( . , @buildparts ) . ; ; $self-{buildsub} = eval sub { $buildcode }; return $self; } sub parse { my $self = shift; my ( $str ) = @_; return $self-{parsesub}-( $str ); } sub build { my $self = shift; my ( $var ) = @_; return $self-{buildsub}-( $var ); } # Keep perl happy; keep Britain tidy 1; 02reversable-simple.t Description: Troff document signature.asc Description: Digital signature
Re: Module Proposal: Parse::Reversible
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 20 Apr 2007, at 18:50, Paul LeoNerd Evans wrote: I'd appreciate some comments on this; specifically, if this functionallity would be useful enough to put on CPAN, or if it seems a quite specialised solution to a specific problem and not worth doing. I like it. You could extend the syntax to provide for printf() formatting: '/photos/album${ALBUM:\d+}/photo${PHOTO:\d+}.jpg' could optionally be '/photos/album${ALBUM:\d+:%04d}/photo${PHOTO:\d+:%04d}.jpg' to get back strings like '/photos/album0123/photo0001.jpg' - -- Andy Armstrong, hexten.net -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (Darwin) iD8DBQFGKP6twoknRJZQnCERApU0AJ0ar/eRTfNvuF6IbcHyTzHlUwC30wCfXkdI VzK3aCUJPw1e2UeJTBCNrRw= =bEjv -END PGP SIGNATURE-
Re: Module Proposal: Parse::Reversible
On 4/20/07, Paul LeoNerd Evans [EMAIL PROTECTED] wrote: The requirement for this module came about intially because I was thinking about how to handle virtual URLs in websites; for example: /photos/album12/photo17.jpg This will fetch the 17th photo from the 12th album, by whatever method internally is used. Internally, we need to know these values. Trying to make as generic a system as possible, I came up with the idea that somewhere in site config, would live a regexp-like pattern to explain how to parse that. This pattern needs to be reversible - the logic that generates pages has to be able to construct URLs that give paths to the files in question. The format I came upon would look like this: '/photos/album${ALBUM:\d+}/photo${PHOTO:\d+}.jpg' This is just named capturing, isn't it? In perl 5.10: qr!/photos/album(?ALBUM\d+)/photo(?PHOTO\d+).jpg!; $url = /photos/album$+{ALBUM}/photo$+{PHOTO}.jpg; With my hacky CPAN module for earlier perls: use Regexp::NamedCaptures; my %photo; qr!/photos/album(?\$photo{ALBUM}\d+)/photo(?\$photo{PHOTO}\d+.jpg!; $url = /photos/album$photo{ALBUM}/photo$photo{PHOTO}.jpg; Josh
Re: Module Proposal: Parse::Reversible
On Fri, 20 Apr 2007 11:25:48 -0700 Joshua ben Jore [EMAIL PROTECTED] wrote: This is just named capturing, isn't it? In perl 5.10: qr!/photos/album(?ALBUM\d+)/photo(?PHOTO\d+).jpg!; $url = /photos/album$+{ALBUM}/photo$+{PHOTO}.jpg; Oh, it's that and more. It's a named capture, sure. But it's also reversible, don't forget. I don't think 5.10 lets you do the reverse like mine does, does it? Supply values for the sub-patterns? Note that mine does both directions in one configuration - that one string that would live in the config file specifies both parsing and rebuilding, rather than your example there requiring two separate strings. -- Paul LeoNerd Evans [EMAIL PROTECTED] ICQ# 4135350 | Registered Linux# 179460 http://www.leonerd.org.uk/ signature.asc Description: PGP signature
Re: Module Proposal: Parse::Reversible
On Fri, 20 Apr 2007 18:55:56 +0100 Andy Armstrong [EMAIL PROTECTED] wrote: could optionally be '/photos/album${ALBUM:\d+:%04d}/photo${PHOTO:\d+:%04d}.jpg' to get back strings like '/photos/album0123/photo0001.jpg' An interesting idea, but what does that buy you that a plain sprintf does not? $pr-build( { ALBUM = sprintf(%04d, $album), PHOTO = sprintf(%04d, $photo) } ) In this example with numbers as the keys perhaps it looks useful, but isn't that a special case? In most cases, we couldn't do anything special like zero-pad the numbers. For example, if we had users in groups: '/groups/${GROUP:\w+}/users/${USER:\w+}.html' There's no other printf format of any interest that comes to mind, other than %s. -- Paul LeoNerd Evans [EMAIL PROTECTED] ICQ# 4135350 | Registered Linux# 179460 http://www.leonerd.org.uk/ signature.asc Description: PGP signature
Re: Module Proposal: Parse::Reversible
On Fri, 20 Apr 2007 21:26:33 +0100 Andy Armstrong [EMAIL PROTECTED] wrote: An interesting idea, but what does that buy you that a plain sprintf does not? $pr-build( { ALBUM = sprintf(%04d, $album), PHOTO = sprintf(%04d, $photo) } ) It encapsulates the formatting requirement where it belongs - with the rest of the specification for that string. I guess there is that; this would seem a good place to do that. But, I am slightly reluctant to do that, for the following reason. Without those printf formats, there is a large invariant symmetry here: my $pr = Parse::Reversible-new( 'any format you like' ); $str == $pr-build( $pr-parse( $str ) ); $vars == $pr-parse( $pr-build( $vars ) ); With format specifiers, we'd break that. I agree that it's not massively general - but you could use it to * generate fixed width fields * truncate reals to ints * specify the number of decimal places Mmm... Though those do sound like quite useful things to have the ability to do... Are we sure on the notation format though? It gets quite hard to parse by this stage if we have ${NAME:pattern:format} if only that, what happens if we want a literal : in our pattern - do we need to escape it? Or maybe to make it look more like a pattern, we might try ${NAME/pattern/format} -- Paul LeoNerd Evans [EMAIL PROTECTED] ICQ# 4135350 | Registered Linux# 179460 http://www.leonerd.org.uk/ signature.asc Description: PGP signature
Re: Module Proposal: Parse::Reversible
* Paul LeoNerd Evans [EMAIL PROTECTED] [2007-04-20 19:55]: The requirement for this module came about intially because I was thinking about how to handle virtual URLs in websites It’s called “URI templates”, has an IETF draft RFC and there’s a tentative implementation already on the CPAN. Regards, -- Aristotle Pagaltzis // http://plasmasturm.org/