Hello,
Enclosed in this mail (882 lines - sorry if this is too big an
attachment for the axkit-users list policy) is a draft version of a
new XML::XPathScript implementation. I am seeking your validation on
my work.
Already done:
* documentation revamped, with a hymn to the benefits of
XPathScript at the top :-)
* adjustments to the architecture to support extensibility, in
order to be able to implement the AxKit version as a derived
object class. THIS IS THE PART THAT NEEDS REVIEWING MOST (reading
the POD ought to be enough to comment upon the planned interface)
* POD unit documentation for most of the innards (stylesheet
callbacks in ::Toys still to be documented)
* "stylesheetfile" option to new(), that renders line numbers
and file names correctly (for the debugger)
* no longer chdir()s to get the relative #includes right
* output to something else than stdout (untested)
Still to be done (I deviated off the initially proposed route by a
few miles, since I didn't intend this first milestone to begin with):
* implement the CODEMEs, FIXMEs, TRYMEs and DOCUMENTMEs in the
current code (I'll do that)
* package a new XML::XPathScript release, update
/usr/bin/xpathscript to use the styleseetfile form of new() (so
that we get line numbers correct in the debugger with the new
version, and do not need to chdir() anymore)
* try to implement AxKit as a subclass, and in the same time port
all enhancements made in AxKit's XPathScript to the standalone
class. This involves pondering over the diff between XPathScript
0.03 (the version in CPAN now) and AxKit current, and deciding
for each line whether it should be refactored or patched into
XML::XPathScript current (the version I'm sending now). Yanick,
would you please give this a try ?
Postponed work until merge of the two XPathScripts is complete:
* document ::Toys
* make a test suite (I'm using my own stylesheet's test suite right now)
--
Dominique QUATRAVAUX Ing�nieur d�veloppeur senior
01 44 42 00 08 IDEALX
#!/usr/bin/perl -w
use strict;
=pod "
=head1 NAME
XML::XPathScript - a Perl framework for XML stylesheets
=head1 SYNOPSIS
use XML::XPathScript;
my $xps = XML::XPathScript->new(xml => $xml, stylesheet => $stylesheet);
# The short way:
$xps->process();
# The long way (caching the compiled stylesheet for reuse and
# outputting to multiple files):
my $xps = XML::XPathScript->new(stylesheetfile => $filename);
my $handler=$xps->compile();
foreach my $xmltree (@xmltrees) {
use IO::File;
my $currentIO=new IO::File($some_name_that_changes_each_time);
&$handler($xmltree, sub {$currentIO->print(shift)});
};
# Making extra variables available to the stylesheet dialect:
my $handler=$xps->compile('$r');
&$handler($xmltree,undef,Apache->request());
=head1 DESCRIPTION
This is the I<XML::XPathScript> stylesheet framework, part of the
AxKit project at http://axkit.org/.
XPathScript is a stylesheet language similar in many ways to XSLT (in
concept, not in appearance), for transforming XML from one format to
another format (possibly HTML, but XPathScript also shines for
non-XML-like output).
Like XSLT, XPathScript uses the powerful ``templates/apply-templates''
and ``cascading stylesheets'' design patterns, that greatly simplifies
the design of stylesheets for programmers. The availability of the
I<XPath> query language inside stylesheets promotes the use of a purely
document-dependent, side-effect-free coding style. Unlike XSLT which
uses its own dedicated control language with an XML-compliant syntax,
XPathScript uses Perl which is terse and highly extendable.
The result of the merge is an extremely powerful environment for
development tasks that involve rendering complex XML documents to
other formats. Stylesheets written in XPathScript are very easy to
create, extend and reuse, even if they treat hundreds of different
XML tags.
=head1 STYLESHEET WRITER DOCUMENTATION
=head2 Creating stylesheets
See http://axkit.org/docs/xpathscript/guide.dkb for a head start.
There you will learn how to markup the embedded dialect and fill in
the template hash $t.
=head2 xpathscript Invocation
This CPAN module is bundled with an "xpathscript" shell tool that
is to be invoked like this:
xpathscript mydocument.xml mystylesheet.xps
It will produce the resulting document on standard output. More
options will be added later (select output file, pass parameters to
the stylesheet etc.).
=head2 Functions available in the stylesheet
A number of callback functions are available from the stylesheet
proper. They apply against the current document and template hash,
which are transparently passed back and forth as global variables (see
L</Global variables>). They are defined in the
I<XML::XPathScript::Toys> package, which is implicitly imported into
all code written in the embedded stylesheet dialect.
=cut "
package XML::XPathScript::Toys;
use XML::XPath::Node;
use vars '@ISA', '@EXPORT';
use Exporter;
@ISA = ('Exporter');
@EXPORT = qw(
findnodes
findvalue
findvalues
findnodes_as_string
apply_templates
matches
set_namespace
);
# quieten warnings when compiling this module
sub apply_templates (;$@);
sub findnodes {
$XML::XPathScript::xp->findnodes(@_);
}
sub findvalue {
$XML::XPathScript::xp->findvalue(@_);
}
sub findvalues {
my @nodes = findnodes(@_);
map { findvalue('.', $_) } @nodes;
}
sub findnodes_as_string {
$XML::XPathScript::xp->findnodes_as_string(@_);
}
sub matches {
$XML::XPathScript::xp->matches(@_);
}
sub set_namespace {
eval {
$XML::XPathScript::xp->set_namespace(@_);
};
if ($@) {
warn "set_namespace failed: $@";
}
}
sub apply_templates (;$@) {
unless (@_) {
return apply_templates(findnodes('/'));
}
my ($arg1, @args) = @_;
if (!ref($arg1)) {
# called with a path to find
warn "apply_templates with path '$arg1'\n";
return apply_templates(findnodes($arg1, @args));
}
my $retval = '';
if ($arg1->isa('XML::XPath::NodeSet')) {
foreach my $node ($arg1->get_nodelist) {
$retval .= translate_node($node);
}
}
else {
$retval .= translate_node($arg1);
foreach my $node (@args) {
$retval .= translate_node($node);
}
}
return $retval;
}
sub _apply_templates {
my @nodes = @_;
my $retval = '';
foreach my $node (@nodes) {
$retval .= translate_node($node);
}
return $retval;
}
sub translate_node {
my $node = shift;
local $^W;
my $translations = $XML::XPathScript::trans;
if (!$node->isElementNode) {
# don't output top-level PI's
if ($node->isPINode) {
return try {
if ($node->getParentNode->getParentNode) {
return $node->toString;
}
return '';
} catch Error with {
return '';
};
}
return $node->toString;
}
# warn "translate_node: ", $node->getName, "\n";
my $node_name = $node->getName;
my $trans = $translations->{$node_name};
if (!$trans) {
$node_name = '*';
$trans = $translations->{$node_name};
}
if (!$trans) {
return start_tag($node) .
_apply_templates($node->getChildNodes) .
end_tag($node);
}
local $^W;
my $dokids = 1;
my $search;
my $t = {};
if ($trans->{testcode}) {
# warn "Evalling testcode\n";
my $result = $trans->{testcode}->($node, $t);
if ($result eq "0") {
# don't process anything.
return;
}
if ($result eq "-1") {
# -1 means don't do children.
$dokids = 0;
}
elsif ($result eq "1") {
# do kids
}
else {
$dokids = 0;
$search = $result;
}
}
local $translations->{$node_name};
# copy old values in
%{$translations->{$node_name}} = %$trans;
if (%$t) {
foreach my $key (keys %$t) {
$translations->{$node_name}{$key} = $t->{$key};
}
$trans = $translations->{$node_name};
}
# default: process children too.
my $pre = interpolate($node, $trans->{pre}) .
($trans->{showtag} ? start_tag($node) : '') .
interpolate($node, $trans->{prechildren});
my $post = interpolate($node, $trans->{postchildren}) .
($trans->{showtag} ? end_tag($node) : '') .
interpolate($node, $trans->{post});
if ($dokids) {
my $middle = '';
for my $kid ($node->getChildNodes()) {
if ($kid->isElementNode) {
$middle .= interpolate($node, $trans->{prechild}) .
_apply_templates($kid) .
interpolate($node, $trans->{postchild});
}
else {
$middle .= _apply_templates($kid);
}
}
return $pre . $middle . $post;
}
elsif ($search) {
my $middle = '';
for my $kid (findnodes($search, $node)) {
if ($kid->isElementNode) {
$middle .= interpolate($node, $trans->{prechild}) .
_apply_templates($kid) .
interpolate($node, $trans->{postchild});
}
else {
$middle .= _apply_templates($kid);
}
}
return $pre . $middle . $post;
}
else {
return $pre . $post;
}
}
sub start_tag {
my ($node) = @_;
my $name = $node->getName;
return '' unless $name;
my $string = "<" . $name;
foreach my $ns ($node->getNamespaceNodes) {
$string .= $ns->toString;
}
foreach my $attr ($node->getAttributeNodes) {
$string .= $attr->toString;
}
$string .= ">";
return $string;
}
sub end_tag {
my ($node) = @_;
if (my $name = $node->getName) {
return "</" . $name . ">";
}
else {
return '';
}
}
sub interpolate {
my ($node, $string) = @_;
return $string if $XPathScript::DoNotInterpolate;
my $new = '';
while ($string =~ m/\G(.*?)\{(.*?)\}/gcs) {
my ($pre, $path) = ($1, $2);
$new .= $pre;
$new .= $node->findvalue($path);
}
$string =~ /\G(.*)/gcs;
$new .= $1 if defined $1;
return $new;
}
1;
=pod "
=head1 TECHNICAL DOCUMENTATION
The rest of this POD documentation is B<not> useful to programmers who
just want to write stylesheets; it is of use only to people wanting to
call existing stylesheets or more generally embed the XPathScript
motor into some wider framework.
I<XML::XPathScript> is an object-oriented class with the following features:
=over
=item *
an I<embedded Perl dialect> that allows the merging of the stylesheet
code with snippets of the output document. Don't be afraid, this is
exactly the same kind of stuff as in I<Text::Template>, I<HTML::Mason>
or other similar packages: instead of having text inside Perl (that
one I<print()>s), we have Perl inside text, with a special escaping
form that a preprocessor interprets and extracts. For XPathScript,
this preprocessor is embodied by the I<xpathscript> shell tool (see
L</xpathscript Invocation>);
=item *
a I<templating engine>, that does the apply-templates loop, starting
from the top XML node and applying templates to it and its subnodes as
directed by the stylesheet.
=back
When run, the stylesheet is expected to fill in the I<template hash>
$t, which is a lexically-scoped variable made available to it at
preprocess time. Alternatively (FIXME: UNIMPLEMENTED), one can build
a template hash outside of the XPathScript Perl dialect and process
some XML document using it.
=head2 Dependencies
Although XPathScript is a core component of AxKit, which will not work
without this module, there is plenty of motivation for doing
stylesheets outside of a WWW application server and so
I<XML::XPathScript> is also distributed as a standalone CPAN module.
The AxKit XPathScript component inherits from this class and provides
the coupling with the application framework by overloading and adding
some methods.
I<XML::XPathScript> requires the following Perl packages:
=over
=item I<Symbol>
For generating a separate namespace in which code from the embedded
dialect will run.
=item I<XML::Parser>
=item I<XML::XPath>
For the XML parser and XPath interpreter, obviously needed. Plans are
to support the I<XML::libXML> package as an alternative, which does
the same as the above in C (and hence an order of magnitude faster).
=item I<File::Basename>
For fetching stylesheets from system files. One may provide other
means of fetching stylesheets through object inheritance (this is what
AxKit does).
=back
=head2 Global variables
Due to the peculiar syntax allowed in the embedded dialect for
accessing the template hash, the stylesheet is not reentrant and
cannot (yet) transform several documents at once.
=over
=item I<$XML::XPathScript::xp>
The XML::XPath object that holds the whole document (created by
L<XML::XPath/new>)
=item I<$XML::XPathScript::trans>
The template hash currently in use (known as $t in the AxKit
documentation). Its keys are element names, and its values are
the matching templates (as hash references).
=back
=cut "
package XML::XPathScript;
use vars qw($VERSION);
use XML::XPath 1.0;
use XML::XPath::XMLParser;
use XML::XPath::Node;
use XML::XPath::NodeSet;
use XML::Parser;
use Symbol;
use File::Basename;
$VERSION = '0.90';
=pod "
=head2 Methods and class methods
=over
=item I<< new(key1=>value1,key2=>value2,...) >>
Creates a new XPathScript translator. The recognized named arguments are
=over
=item xml => $xml
$xml is a scalar containing XML text, or a reference to a filehandle
from which XML input is available. An XML::XPathscript object without
an xml tag is only able to compile stylesheets (see L</SYNOPSIS>).
=item stylesheet => $stylesheet
$stylesheet is a scalar containing the stylesheet text, or a reference
to a filehandle from which the stylesheet text is available. The
stylesheet text may contain unresolved C<< <!--#include --> >>
constructs, which will be resolved relative to ".".
=item stylesheetfile => $filename
Same as I<stylesheet> but let I<XML::XPathScript> do the loading
itself. Using this form, relative C<< <!--#include --> >>s in the
stylesheet file will be honored with respect to the dirname of
$filename instead of "."; this provides SGML-style behaviour for
inclusion (it does not depend on the current directory), which is
usually what you want.
=item compiledstylesheet => $function
Re-uses a previous return value of I<compile()> (see L</SYNOPSIS>),
typically to apply the same stylesheet to several XML documents.
=back
=cut "
sub new {
my $class = shift;
die "Invalid hash call to new" if @_ % 2;
my %params = @_;
my $self = \%params;
bless $self, $class;
}
=pod "
=item I<process()>
=item I<process($outputroutine)>
Processes the document and stylesheet set at construction time, and
prints the result to STDOUT by default. If $outputroutine is set, it
must be either a reference to a filehandle open for output, or a
reference to a subroutine which does the output, as in
$xps->process(sub {print ANOTHERFD (shift);});
or
my $buffer="";
$xps->process(sub {$buffer.=shift;});
CODEME: the subroutine stuff is not done. It has to match parametric
conventions in the compiled handler (see L</compile>).
=cut "
sub process {
my $self = shift;
my ($refxml, $refstylesheet);
if (ref($self->{xml})) {
$refxml++;
}
if (ref($self->{stylesheet})) {
$refstylesheet++;
}
my $xpath = $refxml ?
XML::XPath->new(ioref => $self->{xml})
:
XML::XPath->new(xml => $self->{xml});
my $cv=$self->compile();
local $^W; # Don't moan on sloppyly written stylesheets
$cv->($xpath,sub {print shift});
}
=pod "
=item I<extract($stylesheet)>
=item I<extract($stylesheet,$printform)>
=item I<extract($stylesheet,$printform,$filename)>
=item I<extract($stylesheet,$printform,@includestack)> # from include_file() only
The embedded dialect parser. Given $stylesheet, which is either a
filehandle reference or a string, returns a string that holds all
the code in real Perl. Unquoted text and C<< <%= stuff %> >>
constructs in the stylesheet dialect is converted into invocations of
I<$printform>, which must be a function-like Perl form ( ``print'' by
default), while C<< <% stuff %> >> are transcripted verbatim.
C<< <!-- #include --> >> constructs are expanded by passing their
filename argument to L</include_file> along with @includestack (if any)
like this:
$self->include_file($includefilename,@includestack);
@includestack is not interpreted by I<extract()> (except for the first
entry, to create line tags for the debugger). It is only a bandaid for
I<include_file()> to pass the inclusion stack to itself across the
mutual recursion existing between the two methods (see
L</include_file>). If I<extract()> is invoked from outside
I<include_file()>, @includestack should be either empty or of size one.
This method does a purely syntactic job. No special framework
declaration is prepended for isolating the code in its own package,
defining $t or the like (L</compile> does that). It may be overriden
in subclasses to provide different escape forms in the stylesheet
dialect.
=cut "
sub extract {
my ($self,$stylesheet,$printform,@includestack) = @_;
$printform ||= "print";
my $filename=$includestack[0] || "stylesheet";
my $contents;
if (ref($stylesheet)) {
local $/;
$contents = <$stylesheet>;
}
else {
$contents = $stylesheet;
}
my $script="#line 1 $filename\n",
my $line = 1;
while ($contents =~ /\G(.*?)(<!--#include|<%=?)/gcs) {
my ($text, $type) = ($1, $2);
$line += $text =~ tr/\n//; # count \n's in text
$text =~ s/\|/\\\|/g;
$script .= "$printform(q|$text|);";
$script .= "\n#line $line $filename\n";
if ($type eq '<%=') {
$contents =~ /\G(.*?)%>/gcs || die "No terminating '%>' after line $line";
my $perl = $1;
$script .= "$printform( $perl );\n";
$line += $perl =~ tr/\n//;
}
elsif ($type eq '<!--#include') {
my %params;
while ($contents =~ /\G(\s+(\w+)\s*=\s*(["'])([^\3]*?)\3|\s*-->)/gcs) {
last if $1 eq '-->';
$params{$2} = $4;
}
if (!$params{file}) {
die "No matching file attribute in #include at line $line";
}
$script .= $self->include_file($printform,
$params{file},@includestack);
}
else {
$contents =~ /\G(.*?)%>/gcs || die "No terminating '%>' after line $line";
my $perl = $1;
$perl =~ s/;?$/;/s; # add on ; if its missing. As in <% $foo = 'Hello' %>
$script .= $perl;
$line += $perl =~ tr/\n//;
}
}
if ($contents =~ /\G(.*)/gcs) {
my ($text) = ($1);
$text =~ s/\|/\\\|/g;
$script .= "$printform(q|$text|);";
}
return $script;
}
=pod "
=item I<include_file($print_form,$filename)>
=item I<include_file($print_form,$filename,@includestack)>
Resolves a C<< <!--#include file="foo" --> >> directive on behalf of
I<extract()>, that is, returns the script contents of
I<$filename>. The return value must be de-embedded too, which means
that I<extract()> has to be called recursively to expand the contents
of $filename (which may contain more C<< <!--#include --> >>s etc.)
If $filename is relative (does not begin with "/" or "./"), it is
resolved according to the basename of the stylesheet that includes it
(that is, $includestack[0], see below) or "." if we are in the topmost
stylesheet. Filenames beginning with "./" are considered absolute;
this gives stylesheet writers a way to specify that they really really
want stylesheets that are in the system's current working directory.
@includestack is the include stack currently in use, made up of all
values of $filename through the stack, lastly added (innermost)
entries first. The toplevel stylesheet is not in @includestack
(e.g. the outermost call does not specify an @includestack).
$print_form is an opaque payload for I<extract()> which is passed down
to it across the mutual recursion that exists between both methods.
This method may be overridden in subclasses to provide support for
alternate namespaces (e.g. ``axkit://'' URIs).
FIXME: should handle topmost stylesheet better when we know its filename
(entails filename argument to new()).
=cut "
sub include_file {
my ($self, $print_form, $filename, @includestack) = @_;
# should maybe check for circular includes here...
#warn "INCLUDE: $filename\n";
if ($filename !~ m|^\.?/|) {
my $reldir;
# We guarantee that all values we insert into @includestack begin
# either with "/" or "./". This allows us to do the relative
# directory thing, and at the same time we get to safely ignore
# bizarre URIs inserted by inheriting classes.
if ($includestack[0] && $includestack[0] =~ m|^\.?/|) {
$reldir=dirname($includestack[0]);
} else {
$reldir=".";
};
$filename = "$reldir/$filename";
}
my $sym = gensym;
open($sym, $filename) || do {
use Carp;
Carp::croak "Can't read include file '$filename': $!";
};
return $self->extract($sym,$print_form,$filename,@includestack);
}
=pod "
=item I<compile()>
=item I<compile(varname1, varname2,...)>
Compiles the stylesheet set at I<new()> time and returns an anonymous
CODE reference. $stylesheet shall be written in the unparsed embedded
dialect (e.g. C<< ->extract($stylesheet) >> will be called first
inside I<compile()>).
$val1, $val2, etc. are extraneous arguments matching I<varname1>,
I<varname2>, etc. that will be made available to the stylesheet
dialect as lexically scoped variables. L</SYNOPSIS> shows a way to use
this feature to pass the Apache handler to AxKit XPathScript
stylesheets, which explains this feature better than a lengthy
paragraph would do.
I<Warning:> the remaining of the documentation for this method assumes
that you either are a functional programming monger, or that you have
read L</SYNOPSIS> first (describing functions that return functions
taking functions as arguments is never easy).
The return value is a handler function which does the rendering of any
XML document. It has the following prototype:
&$compiledfunc($xpath,$outputfunc,$val1,$val2,...);
$xpath is an I<XML::XPath> object, $outputfunc is a function that
takes one argument and prints it (or sends it to the browser, or to
the next stage of XML pipeline, or wherever it wants). $compiledfunc
will make repeated calls to $outputfunc until the complete document
is output.
CODEME: honor compiledstylesheet set at new() time and we're done with caching
=cut "
sub compile {
my ($self,@extravars) = @_;
my $stylesheet;
if (exists $self->{stylesheet}) {
$stylesheet=$self->{stylesheet};
} elsif (exists $self->{stylesheetfile}) {
# This hack fails if $self->{stylesheetfile} contains
# double quotes. I think we can ignore this and get
# away.
$stylesheet=qq:<!--#include file="$self->{stylesheetfile}" -->\n:;
} else {
die "Cannot compile without a stylesheet";
};
my $script = $self->extract($stylesheet);
#TRYME
# my $script = $self->extract($stylesheet,'&$_[1]');
my $package=gen_package_name();
my $eval = join("\n",
"package $package;",
"use XML::XPath::Node;",
'XML::XPathScript::Toys->import;',
'sub {',
'my (undef,undef,'.join('',@extravars).') = @_;',
'my $t = {};',
'local $XML::XPathScript::trans = $t;', # Yes,
# this does the sharing! Perl is a bizarre and
# wonderful language.
'local $XML::XPathScript::xp=$_[0];',
$script,
"}",
);
local $^W;
#warn "Compiling: $eval\n";
my $retval=eval $eval;
die $@ if (!defined $retval);
return $retval;
}
=pod "
=back
=head2 Utility functions
The functions below are not methods.
=over
=item I<XML::XPath::Function::document>
An XPath function made available to XPath expressions in the
stylesheet. DOCUMENTME.
=cut "
sub XML::XPath::Function::document {
my $self = shift;
my ($node, @params) = @_;
die "document: Function takes 1 parameter\n" unless @params == 1;
my $parser = XML::XPath::XMLParser->new();
my $results = XML::XPath::NodeSet->new();
my $newdoc;
my $sym = gensym;
my $file = $params[0];
open($sym, $file) || die "Cannot open document() file '$file': $!";
$newdoc = $parser->parse( ioref => $sym );
$results->push($newdoc) if $newdoc;
return $results;
}
=pod "
=item I<gen_package_name()>
Generates a fresh package name in which we would compile a new
stylesheet. Never returns twice the same name.
=cut "
do {
my $uniquifier;
sub gen_package_name {
$uniquifier++;
return "XML::XPathScript::STYLESHEET$uniquifier";
}
};
1;
__END__
=back
=head1 AUTHOR
Matt Sergeant, [EMAIL PROTECTED]
=head1 LICENSE
This is free software. You may distribute it under the same terms as
Perl itself.
=cut
# Local Variables:
# mode:cperl
# tab-width:8
# End:
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]