Re: Module Proposal: Parse::Reversible

2007-04-25 Thread David Nicol

has anyone mentioned xeger yet in this discussion?


Re: Module Proposal: Parse::Reversible

2007-04-25 Thread Paul LeoNerd Evans
On Wed, 25 Apr 2007 13:44:11 -0500
David Nicol [EMAIL PROTECTED] wrote:

 has anyone mentioned xeger yet in this discussion?

Not heard of that, no..

And it's unfortunately a little late now, behold:

  http://search.cpan.org/~pevans/String-MatchInterpolate-0.01/

An initial version :)

I'm still inviting comment on its behaviour though...

-- 
Paul LeoNerd Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: PGP signature


Re: Module Proposal: Parse::Reversible

2007-04-25 Thread Andy Armstrong

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 25 Apr 2007, at 21:18, Paul LeoNerd Evans wrote:

I'm still inviting comment on its behaviour though...


I like it. After

This regexp should not contain any capture brackets ( ) as these  
will confuse the parsing logic.


I'd add Instead use non-capturing brackets (?: ) to group subparts  
of the regexp. just to avoid anyone cargo-culting the notion that  
you can't use brackets at all.


- --
Andy Armstrong, hexten.net

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (Darwin)

iD8DBQFGL8f+woknRJZQnCERAthbAKDNGsQwCcIv2H2qOdu75snV3bYreACglG59
H8XIHuSpLnyTiCKZG1lP4kc=
=H1Md
-END PGP SIGNATURE-


Re: Module Proposal: Parse::Reversible

2007-04-24 Thread Paul LeoNerd Evans
On Fri, Apr 20, 2007 at 06:50:03PM +0100, Paul LeoNerd Evans wrote:
 package Parse::Reversable;

I'm suddenly not so sure on the name any more...

It's not just parsing, it's not just interpolation. It's both. To name
it after one of these operations ignores the other.

So I think somewhere under either Text:: or String:: might be better.

What's the difference between these two roots? Why might one be favoured
over the other?

And what actual name? It pains me that my best attempt so far is

  String::ParsableInterpolable

Surely we can do better than that?

-- 
Paul LeoNerd Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: Digital signature


Re: Module Proposal: Parse::Reversible

2007-04-24 Thread Paul LeoNerd Evans
On Tue, Apr 24, 2007 at 01:31:02PM +0100, Paul LeoNerd Evans wrote:
   String::ParsableInterpolable
 
 Surely we can do better than that?

Actually, I'm not even sure on the parsable part now. Parsing would
imply some sort of possibly-recursive, context-aware grammar system.
This is much simpler - just literal strings with regexp-matched
substrings in them.

Perhaps more Matchable than Parsable.

  String::MatchInterpolate

maybe?

-- 
Paul LeoNerd Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: Digital signature


Re: Module Proposal: Parse::Reversible

2007-04-24 Thread Sébastien Aperghis-Tramoni

Paul LeoNerd Evans wrote:


No, I think at this point we have to appeal to the core reason for
creating this module in the first place; namely, that it is
bidirectional. Parsing a string into variables, or interpolating the
variables back into a string. Both can be done within one object,
symmetrically. To introduce something that breaks that symmetry
effectively removes the requirement that it be done within one  
object, at
which point one might as well use two separate objects for each  
individual

operation.


If the main objects your module will manipulate still are URIs, maybe  
it should be in the URI:: namespace. And couldn't the bidirectional  
relation you want to create be seen like a mapping? Hence URI::Mapper  
or something similar?


--
Sébastien Aperghis-Tramoni

Close the world, txEn eht nepO.




Re: Module Proposal: Parse::Reversible

2007-04-24 Thread Paul LeoNerd Evans
On Tue, Apr 24, 2007 at 04:18:11PM +0200, Sébastien Aperghis-Tramoni wrote:
 If the main objects your module will manipulate still are URIs, maybe  
 it should be in the URI:: namespace.  And couldn't the bidirectional  
 relation you want to create be seen like a mapping? Hence URI::Mapper  
 or something similar?

Not necessarily. I was trying to keep it generic. URL/URIs, file paths,
ID strings,... any thing that has a simple string format which includes
variables somehow embedded in it. LDAP DNs maybe?

  dn: cn=${NAME:\w+}, o=${UNIT:\w+}, dc=example, dc=com

I wanted to leave it generic at a string level.

-- 
Paul LeoNerd Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: Digital signature


Re: Module Proposal: Parse::Reversible

2007-04-24 Thread Paul LeoNerd Evans
On Tue, Apr 24, 2007 at 04:01:10PM +0100, Andy Armstrong wrote:
 Text::Transform::Reversible ?

Transform is too generic.. text goes in, other text goes out... That
doesn't capture the essence of pattern matching (no pun intended :) ).

-- 
Paul LeoNerd Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: Digital signature


Re: Module Proposal: Parse::Reversible

2007-04-24 Thread A. Pagaltzis
* Paul LeoNerd Evans [EMAIL PROTECTED] [2007-04-24 14:35]:
 It's not just parsing, it's not just interpolation. It's both.
 To name it after one of these operations ignores the other.
 
 So I think somewhere under either Text:: or String:: might be
 better.

String::Template?

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/


Re: Module Proposal: Parse::Reversible

2007-04-24 Thread Andy Armstrong

On 24 Apr 2007, at 18:07, A. Pagaltzis wrote:

So I think somewhere under either Text:: or String:: might be
better.


String::Template?


String::Template::Reversible maybe? String::Template sounds like a  
namespace rather than a module.


--
Andy Armstrong, hexten.net



Re: Module Proposal: Parse::Reversible

2007-04-24 Thread A. Pagaltzis
* Andy Armstrong [EMAIL PROTECTED] [2007-04-24 19:15]:
 String::Template::Reversible maybe? String::Template sounds
 like a namespace rather than a module.

I don’t know what it means for something to “sound like a
namespace.” :-)

Also, I think of templates as generally reversible anyway.
Think of printf/scanf, strftime/strptime, URI::Template,
etc. String templates often go both ways.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/


Re: Module Proposal: Parse::Reversible

2007-04-24 Thread Paul LeoNerd Evans
On Tue, Apr 24, 2007 at 07:30:56PM +0200, A. Pagaltzis wrote:
 I don’t know what it means for something to “sound like a
 namespace.” :-)
 
 Also, I think of templates as generally reversible anyway.
 Think of printf/scanf, strftime/strptime, URI::Template,
 etc. String templates often go both ways.

Well, most of the decent-sized template modules on CPAN aren't
inherently reversible... Consider Template::Toolkit, Text::Template,
HTML::Template, Mason,

Also, String::Template sounds too much like Text::Template, which it
isn't really.. It's a totally different idea.

-- 
Paul LeoNerd Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: Digital signature


Re: Module Proposal: Parse::Reversible

2007-04-24 Thread A. Pagaltzis
* Paul LeoNerd Evans [EMAIL PROTECTED] [2007-04-24 19:40]:
 Also, String::Template sounds too much like Text::Template,
 which it isn't really.. It's a totally different idea.

Not to me. “Text” to means a document (or some arbitrarily small
unit of a document) that has meaning to a human. A “String” is a
data type consisting of an ordered sequence of characters – an
entity in terms of computing.

Admittedly the difference may appear sort of hand-wavy, but to
me, “Text::Template” and “String::Template” convey extremely
different things.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/


Re: Module Proposal: Parse::Reversible

2007-04-24 Thread Daniel T. Staal

On Tue, April 24, 2007 11:05 am, Paul LeoNerd Evans said:
 On Tue, Apr 24, 2007 at 04:01:10PM +0100, Andy Armstrong wrote:
 Text::Transform::Reversible ?

 Transform is too generic.. text goes in, other text goes out... That
 doesn't capture the essence of pattern matching (no pun intended :) ).

How about Text::Transform::ReversiblePattern ?

Or even just Text::ReversiblePattern ?

Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---



Re: Module Proposal: Parse::Reversible

2007-04-24 Thread Paul LeoNerd Evans
On Tue, 24 Apr 2007 14:20:40 -0400 (EDT)
Daniel T. Staal [EMAIL PROTECTED] wrote:

 How about Text::Transform::ReversiblePattern ?
 
 Or even just Text::ReversiblePattern ?
 
 Daniel T. Staal

I am tempted by that, but I would prefer it in the String:: space; as
A. Pagaltzis points out below; string is just a sequence of characters,
which is what we have here, whereas text would imply some higher-level
human meaning.

-- 
Paul LeoNerd Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: PGP signature


Re: Module Proposal: Parse::Reversible

2007-04-22 Thread Paul LeoNerd Evans
On Fri, 20 Apr 2007 21:26:33 +0100
Andy Armstrong [EMAIL PROTECTED] wrote:

 I agree that it's not massively general - but you could use it to
 
 * generate fixed width fields
 * truncate reals to ints
 * specify the number of decimal places

On further thought, that all sounds very specific to numbers, and hard to
generalise. It also breaks the symmetry of operations, as I have already
noted.

Also, someone else objects by private email that:

 Surely you've just reintroduced what your were trying to eliminate in
 the first place:  separate languages for describing patterns and for
 describing formatting.

This does indeed seem to be the case. If you wanted full formatting
control, you might as well go with two separate strings; one for pattern
capture, one for formatted interpolation.

 Given a sprintf format, it's easy to work out what the pattern for it
 would be, so can the pattern be optional?

This is equally difficult in this direction.

Consider for a moment a cut-down case of UK Postal Codes, which would
match a regexp

  ${POSTCODE:[A-Z][A-Z][0-9] [0-9][A-Z][A-Z]}

Were we instead to provide just a printf format for this, the only one
that comes to mind is '%s'. Hard to say how we'd ever know to generate
that specific pattern from there.

No, I think at this point we have to appeal to the core reason for
creating this module in the first place; namely, that it is
bidirectional. Parsing a string into variables, or interpolating the
variables back into a string. Both can be done within one object,
symmetrically. To introduce something that breaks that symmetry
effectively removes the requirement that it be done within one object, at
which point one might as well use two separate objects for each individual
operation.

-- 
Paul LeoNerd Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: PGP signature


Re: Module Proposal: Parse::Reversible

2007-04-22 Thread A. Pagaltzis
* Paul LeoNerd Evans [EMAIL PROTECTED] [2007-04-21 15:00]:
 Theirs don't have the regexp matches like mine, but I guess
 that could always be done after the components have been pulled
 out. (Also the implementation is a bit less efficient, but that
 can be neatened up).

I think of URIs as serialised structured objects, not just as
strings; they even have an escaping mechanism. Running regexes
against them to pluck out data is kinda like groping at HTML with
patterns, just less likely to break because URIs are a much
simpler serialisation format than markup languages.

Putting intricate regexes, eg. for zipcodes, in the URI template
is like putting zipcode validation in CGI.pm. It doesn’t belong
there. The place for it is several layers down, in the model,
which is environment agnostic and validates data the same way
whether it comes from a web request or the command line or
anywhere else.

That’s how I see it.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/


Re: Module Proposal: Parse::Reversible

2007-04-21 Thread Andy Armstrong

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 21 Apr 2007, at 13:57, Paul LeoNerd Evans wrote:
Hmm.. Decisions... Do I change my code to use URI::Template, or  
release

Parse::Reversible anyway, on the grounds that it does cover a slightly
different area, even if in my case they're both usable.?


I like the generality of Parse::Reversible. I've just done a similar  
thing for Perl::Version to enable it to modify version numbers but  
retain their formatting. I didn't have the sense to generalise it  
though. If you release your module I'd certainly consider modifying  
Perl::Version to use it.


- --
Andy Armstrong, hexten.net

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (Darwin)

iD8DBQFGKhxwwoknRJZQnCERAs1MAJ43l/EfS6lbfj4HaZKMZEA1hjnMvACgp2GP
OeUD4jPS24xRvv4YWGmjNYk=
=PO+4
-END PGP SIGNATURE-


Module Proposal: Parse::Reversible

2007-04-20 Thread Paul LeoNerd Evans
The requirement for this module came about intially because I was
thinking about how to handle virtual URLs in websites; for example:

  /photos/album12/photo17.jpg

This will fetch the 17th photo from the 12th album, by whatever method
internally is used. Internally, we need to know these values. Trying to
make as generic a system as possible, I came up with the idea that
somewhere in site config, would live a regexp-like pattern to explain
how to parse that. This pattern needs to be reversible - the logic that
generates pages has to be able to construct URLs that give paths to the
files in question.

The format I came upon would look like this:

  '/photos/album${ALBUM:\d+}/photo${PHOTO:\d+}.jpg'

This pattern consists of literals, with variable interpolations embedded
in it. Looks obvious from a string-generation point of view. Also regexp
patterns are present, to explain what is valid in each position.

This means, given this pattern, we can convert in either direction:

  /photos/album12/photo17.jpg 
gives: { ALBUM = 12, PHOTO = 17 }

  { ALUBM = 9, PHOTO = 15 }
gives: /photos/album9/photo15.jpg

The use-case here is that patterns come from some source such as a
config file, being a fairly small fixed set which is known at the time
the server is started. Incoming strings or sets of replacement values
come from the running of the server, which is much more often. The
implementation I have chosen, compiles the pattern into two CODE
references, to allow efficient runtime usage, comparible to hand-coded
regexps or variable interpolation. Also, no special considerations on
the security of the patterns are made - it would be quite possible to
embed arbitrary perl code within these patterns - the current
implementation does not protect against this because the source of these
patterns is assumed to be trusted. This would be noted in the
documentation of this class.

I've written an implementation of code, and a test script, by way of
example for how it might be used. Find these attached.

I'd appreciate some comments on this; specifically, if this
functionallity would be useful enough to put on CPAN, or if it seems a
quite specialised solution to a specific problem and not worth doing.

-- 
Paul LeoNerd Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/
package Parse::Reversable;

use strict;

use Carp;

sub new
{
   my $class = shift;
   my ( $pattern, %opts ) = @_;

   my $self = bless {
  pattern = $pattern,
   }, $class;

   my %vars;

   my $parsepattern = ;
   my $capturenumber = 1;
   my $parsebind = ;

   my @buildparts;

   # The buildsub closure will contain elements of this array in its
   # environment
   my @literals;

   my @components = split( m/(\$\{\w+:.*?\})/, $pattern, -1 );
   foreach my $c ( @components ) {
  next if length( $c ) == 0;
  if( $c =~ m/^\$\{(\w+):(.*)\}$/ ) {
 my ( $var, $pattern ) = ( $1, $2 );
 croak Multiple occurances of $var if exists $vars{$var};
 $vars{$var} = 1;

 $parsepattern .= ($pattern);
 $parsebind .=\$var-{$var} = \$$capturenumber;\n;
 $capturenumber++;

 push @buildparts, \$var-{$var};
  }
  else {
 $parsepattern .= quotemeta $c;

 push @literals, $c;
 push @buildparts, \$literals[$#literals];
  }
   }

   if( $opts{allow_trail} ) {
  $parsepattern .= (.*?);
  $parsebind .=\$var-{_trail} = \$$capturenumber;\n;
  $capturenumber++;
   }

   my $parsecode = 
   \$_[0] =~ m{^$parsepattern\$} or return undef;
   my \$var = {};
$parsebind
   \$var;
;

   $self-{parsesub} = eval sub { $parsecode };

   my $buildcode = 
   my ( \$var ) = [EMAIL PROTECTED];
. join(  . , @buildparts ) . ;
;

   $self-{buildsub} = eval sub { $buildcode };

   return $self;
}

sub parse
{
   my $self = shift;
   my ( $str ) = @_;
   return $self-{parsesub}-( $str );
}

sub build
{
   my $self = shift;
   my ( $var ) = @_;
   return $self-{buildsub}-( $var );
}

# Keep perl happy; keep Britain tidy
1;


02reversable-simple.t
Description: Troff document


signature.asc
Description: Digital signature


Re: Module Proposal: Parse::Reversible

2007-04-20 Thread Andy Armstrong

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 20 Apr 2007, at 18:50, Paul LeoNerd Evans wrote:

I'd appreciate some comments on this; specifically, if this
functionallity would be useful enough to put on CPAN, or if it seems a
quite specialised solution to a specific problem and not worth doing.


I like it. You could extend the syntax to provide for printf()  
formatting:


  '/photos/album${ALBUM:\d+}/photo${PHOTO:\d+}.jpg'

could optionally be

  '/photos/album${ALBUM:\d+:%04d}/photo${PHOTO:\d+:%04d}.jpg'

to get back strings like

  '/photos/album0123/photo0001.jpg'

- --
Andy Armstrong, hexten.net

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (Darwin)

iD8DBQFGKP6twoknRJZQnCERApU0AJ0ar/eRTfNvuF6IbcHyTzHlUwC30wCfXkdI
VzK3aCUJPw1e2UeJTBCNrRw=
=bEjv
-END PGP SIGNATURE-


Re: Module Proposal: Parse::Reversible

2007-04-20 Thread Joshua ben Jore

On 4/20/07, Paul LeoNerd Evans [EMAIL PROTECTED] wrote:

The requirement for this module came about intially because I was
thinking about how to handle virtual URLs in websites; for example:

  /photos/album12/photo17.jpg

This will fetch the 17th photo from the 12th album, by whatever method
internally is used. Internally, we need to know these values. Trying to
make as generic a system as possible, I came up with the idea that
somewhere in site config, would live a regexp-like pattern to explain
how to parse that. This pattern needs to be reversible - the logic that
generates pages has to be able to construct URLs that give paths to the
files in question.

The format I came upon would look like this:

  '/photos/album${ALBUM:\d+}/photo${PHOTO:\d+}.jpg'


This is just named capturing, isn't it? In perl 5.10:

 qr!/photos/album(?ALBUM\d+)/photo(?PHOTO\d+).jpg!;
 $url = /photos/album$+{ALBUM}/photo$+{PHOTO}.jpg;

With my hacky CPAN module for earlier perls:

 use Regexp::NamedCaptures;
 my %photo;
 qr!/photos/album(?\$photo{ALBUM}\d+)/photo(?\$photo{PHOTO}\d+.jpg!;
 $url = /photos/album$photo{ALBUM}/photo$photo{PHOTO}.jpg;

Josh


Re: Module Proposal: Parse::Reversible

2007-04-20 Thread Paul LeoNerd Evans
On Fri, 20 Apr 2007 11:25:48 -0700
Joshua ben Jore [EMAIL PROTECTED] wrote:

 This is just named capturing, isn't it? In perl 5.10:
 
   qr!/photos/album(?ALBUM\d+)/photo(?PHOTO\d+).jpg!;
   $url = /photos/album$+{ALBUM}/photo$+{PHOTO}.jpg;

Oh, it's that and more. It's a named capture, sure. But it's also
reversible, don't forget. I don't think 5.10 lets you do the reverse like
mine does, does it? Supply values for the sub-patterns?

Note that mine does both directions in one configuration - that one
string that would live in the config file specifies both parsing and
rebuilding, rather than your example there requiring two separate strings.

-- 
Paul LeoNerd Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: PGP signature


Re: Module Proposal: Parse::Reversible

2007-04-20 Thread Paul LeoNerd Evans
On Fri, 20 Apr 2007 18:55:56 +0100
Andy Armstrong [EMAIL PROTECTED] wrote:

 could optionally be
 
'/photos/album${ALBUM:\d+:%04d}/photo${PHOTO:\d+:%04d}.jpg'
 
 to get back strings like
 
'/photos/album0123/photo0001.jpg'

An interesting idea, but what does that buy you that a plain sprintf does
not?

  $pr-build( { ALBUM = sprintf(%04d, $album),
PHOTO = sprintf(%04d, $photo) } )

In this example with numbers as the keys perhaps it looks useful, but
isn't that a special case? In most cases, we couldn't do anything special
like zero-pad the numbers.

For example, if we had users in groups:

 '/groups/${GROUP:\w+}/users/${USER:\w+}.html'

There's no other printf format of any interest that comes to mind, other
than %s.

-- 
Paul LeoNerd Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: PGP signature


Re: Module Proposal: Parse::Reversible

2007-04-20 Thread Paul LeoNerd Evans
On Fri, 20 Apr 2007 21:26:33 +0100
Andy Armstrong [EMAIL PROTECTED] wrote:

  An interesting idea, but what does that buy you that a plain  
  sprintf does
  not?
 
$pr-build( { ALBUM = sprintf(%04d, $album),
  PHOTO = sprintf(%04d, $photo) } )
 
 It encapsulates the formatting requirement where it belongs - with  
 the rest of the specification for that string.

I guess there is that; this would seem a good place to do that.

But, I am slightly reluctant to do that, for the following reason.
Without those printf formats, there is a large invariant symmetry
here:

  my $pr = Parse::Reversible-new( 'any format you like' );

  $str  == $pr-build( $pr-parse( $str  ) );

  $vars == $pr-parse( $pr-build( $vars ) );

With format specifiers, we'd break that.

 I agree that it's not massively general - but you could use it to
 
 * generate fixed width fields
 * truncate reals to ints
 * specify the number of decimal places

Mmm... Though those do sound like quite useful things to have the ability
to do...

Are we sure on the notation format though? It gets quite hard to parse by
this stage if we have

  ${NAME:pattern:format}

if only that, what happens if we want a literal : in our pattern - do we
need to escape it?

Or maybe to make it look more like a pattern, we might try

  ${NAME/pattern/format}

-- 
Paul LeoNerd Evans

[EMAIL PROTECTED]
ICQ# 4135350   |  Registered Linux# 179460
http://www.leonerd.org.uk/


signature.asc
Description: PGP signature


Re: Module Proposal: Parse::Reversible

2007-04-20 Thread A. Pagaltzis
* Paul LeoNerd Evans [EMAIL PROTECTED] [2007-04-20 19:55]:
 The requirement for this module came about intially because I
 was thinking about how to handle virtual URLs in websites

It’s called “URI templates”, has an IETF draft RFC and there’s a
tentative implementation already on the CPAN.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/