Re: [RT:Long] Initial Results and comments (was Re: Compiling XML,and its replacement)

Berin Loritsch Fri, 04 Apr 2003 06:07:56 -0800

Stefano Mazzocchi wrote:

Considering we have a 5:1 size to time scaling ratio, it would be
interesting to see if it carries out to a much larger XML file--
if only I had one.  If scalability was linear, then a 1,580,000
byte file should only take .23 ms to parse.
Are you aware of the fact that any java method cannot be greater than 64Kb of bytecode? And I'm also sure there is a limit on how many methods a java class can have.

So, at the very end, you have a top-size limit on how big your compiled-in-memory object can be.


Absolutely.  However this is a stepping stone.  I haven't begun to look
at compiler optimizations yet.  I am trying to get the interface the way
I like it, and the thing to merely function (which I did last night!)

In this instance though, I believe that we are dealing with more than
just "unrolled loops"  We are dealing with file reading overhead, and
interpretation overhead.  Your *compressed* XML addresses the second
issue, but in the end I believe it will behave very similarly to my
solution.
Good point. But you are ignoring the fact that all modern operating systems have cached file systems. And, if this was not the case, it would be fairly trivial to implement one underneat a source resolver.


:) And yet certain operations touch the file and incorporate a call to
blocking filesystem code.  Seriously though, once a file is read into
memory, it's all about the parsing and processing.  With my solution
there is nothing to process--it's all been done.

Also keep in mind that improvements in the compiler design (far future)
can allow for repetitive constructs to be moved into a separate method.
For instance, the following XML is highly repetitive:

<snip/>

Still allowing for some level of hotspot action.

I see, also to overcome the 64kb method limitation.

:) Yep.

However, I believe the true power of Binary XML will be with its
support for XMLCallBacks and (in the mid term future) decorators.

Can you elaborate more on this?


I just got this "working" in the sense that it is operational, not
in the sense that it is elegant, or where it needs to be.  Presently
I am using Processing Instructions to represent when a callback is
required.  What I want to do is allow actual XMLFragments be converted
to callbacks in the compiler.  That would allow direct support for
conventions such XInclude or other standards.  Unfortunately, it
proved too difficult for the short term.

For now, what I have working is this:

<test>
  <element withAttribute="true"/>
  <document>Add some text here</document>

  <?include-xml ../../build.xml?>
</test>

When this document is compiled you get the standard SAX events that
you expect, but the processing instruction is compiled as an
XMLCallBack.  This proved to be the easiest thing from an implementation
perspective--but I am open to alternatives.

The beauty of this approach is that CallBacks are much easier to
develop than something that works with SAX events on the fly.  I
have to add some more helper classes to make that statement true,
and your compressed XML would most likely be a key element of that.

However the concept is simple.  A document can be boiled down to
the parts that *never* change, and the elements that do change
are represented by easily developed code.  I'm thinking like a
developer, not a script kiddie.

The consequence of the design decisions is that we can never have
anything like [AJX]SP abuses like the following:

<xsp:logic>
  for (int i = 0; i < 10; i++)
  {
</xsp:logic>

<element/>

<xsp:logic>
  }
</xsp:logic>


That is valid (but *very* poorly written XSP).  The XML can be boiled
down to things like this:

<html>
  <head><title><?doc-title theme="coco"?></title></head>
  <body>
    <table>
      <tr><td><img src="logo.png"/></td>
          <td><?doc-title theme="coco"?></td></tr>
      <tr rowspan="2"/><td><?site-tabs theme="coco"?></td></tr>
    </table>
    <table>
      <tr>
        <td><?site-menu theme="coco"?></td>
        <td><?doc-content theme="coco"?></td>
        <td><?site-tools theme="coco"></td>
      </tr>
    </table>
  </body>
</html>

Notice the embedded processing instructions?  They would be set
to call certain callback methods which could be used to provide
a common look and feel to all the docs.

The processing instruction would have the callback name (which
will be accessible via the JAR Services mechanism), and the
proper theme is preserved throughout the document.

It also means that certain things like the menu, tabs, content,
and tools can have the same logic but apply the specified
decorator (could be XSLTC, or could be something else).

The pipeline for this would be very simple:

<site:match pattern="*.html">
  <site:act name="choose-doc" source="{1}"/>
  <site:generate type="bxml" source="coco.xml"/>
  <site:serialize/>
</site:match>

It's been a while so I appologize if my sitemap logic is off.

But notice that there is no need for a transformer?

There is alot of work to make my vision happen, but it should be
much more natural for developers to work with than trying to write
a transformer to intercept certain logic.  The code that the developer
would have to write would be much more compact and readable as well.

To make this a reality, the XMLRepository needs to be modified to
allow temporary storage of XMLFragments, and the compiler needs to
be altered to allow for different compilation strategies (i.e.
optimize for fragments, etc.)

Anyway, hopefully you will see some advantages in the approach.

The decorator concept will allow us to set a series of SAX events
for a common object.  This will render the XSLT stage a moot point
as we can apply pre-styled decorators to the same set of objects.

Isn't this what a translet (an xsltc-compiled XSLT stylesheet) was supposed to be?


You would know better.  However what I was thinking of is something more
along the lines of this:

XMLDecorator
{
    transform( Object o, ContentHandler handler );
}

In a directory renderer callback I might have code like this:

DirectoryCallBack
{
    // exclude all the init code

    XMLFragment process( Properties props )
    {
        File dir = new File(props.getProperty("dir");
        CompressedFragment xml = new CompressedFragment();

m_fileDecorator(dir, xml.contentHandler());

        return xml;
    }
}

The callback code is pretty simple.  I can easily create the callback,
and delegate to a delegator the actual representation of the object.
The object can be represented as XHTML directly, and it would be
embedded in the proper location.

Anyway, I'm happy to see new approaches to xml generation being researched.


I had the concept a long time ago, and I think it could fit quite well
in the Cocoon concept.  My goal is to replace XSP with a more programmer
friendly alternative--not to make Cocoon absolete.

Re: [RT:Long] Initial Results and comments (was Re: Compiling XML,and its replacement)

Reply via email to