Re: Editing Text in the Intermediate Format

Andreas Delmelle Wed, 04 Jun 2008 05:43:55 -0700

On Jun 4, 2008, at 12:36, Martin Edge wrote:

Hi Martin

<snip />
When printing, each individual document is tracked via the barcodethat Iprint - and which case I need to (after FOP has decided what fitsinto whatpage) rewrite the barcode place holders with the correctconfiguration for
the Document/Page number. Thus why my current work exists in the
Intermediate Format phase.
So - is it possible given my scenario to apply what you've said (orany
similar templating feature) at the FO or XSLT or Saxon phase?

Not really at those points in the process, if I understand yourquestion and description of the setup correctly.


The complete process would still look like:

XML + XSLT -> FO
  -> initial Area Tree + XSLT
  -> modified Area Tree
  -> PDF/PS/PCL

The main point was that the modification of the initial area tree isprobably best done by means of another stylesheet. It's the easiestway to manipulate semantic XML, and as demonstrated, requiresrelatively little extra coding. The amount of extra work isdetermined by how much you actually need to change in the initialresult. Saxon should have no problem transforming your 800MB areatree, although it will obviously take time...

I'm not too familiar with the idioms for chaining suchtransformations in .NET, but I know for a fact that the above can beimplemented in Java without writing intermediate files to disk orhaving the entire intermediate documents in memory. The fact thatyou're using a mixture of .NET and a Java application may complicatethings here, but I'm thinking:


1) XML + XSLT -> FO (using Saxon)
2) FO -> initial Area Tree (using FOP)
3) initial Area Tree + XSLT -> modified Area Tree (using Saxon)
4) modified Area Tree -> PDF/PS/PCL (using FOP)

In theory, if you use Saxon for Java, all those steps can be handledin the same Java VM session by writing a small custom Java wrapper.For hints on how to go about that, see the example-code on thewebsite: such a wrapper would look a lot like a mixture of the samplecode for using the IF and ExampleXML2PDF(*).The output emitted by Saxon's Transformer can immediately be pipedthrough to FOP for the initial rendering (see ExampleXML2PDF: youwould need MimeConstants.MIME_FOP_AREA_TREE instead ofMimeConstants.MIME_PDF)Analogously, the OutputStream --which does not necessarily have to beFileOutputStream-- passed to FOP's Renderer for the initial renderingcan serve as the Source for another Transformer which is associatedwith the second stylesheet. Finally, the Result of that transformwould in turn be used as a Source for FOP's AreaTreeParser for thefinal rendering.

This could avoid writing the FO or area trees to disk, and eliminatesthe cost of multiple JVM warmups that you're stuck with if you callout to FOP multiple times using the console from within your .NETapplication (if this is how you currently handle it?). You wouldinstead call out only once, to the custom wrapper application. To goeven further, the wrapper could be set up as a web-service, whichwould make it possible to use cached Templates (parsing thestylesheets only once), and avoid JVM warmup for all but the veryfirst run. Calling out to the console would then be replaced bysending an HTTP request to whatever server:port hosts the service.

Of course, this is a rather simplistic sketch, which may raise morequestions than it answers, and I have no idea where the initial XMLand XSLT (XSL-FO) in your setup come from. If they exist as files ondisk somewhere, the above could be relatively simple to implement. Ifthey're only exposed to you as .NET objects, that would obviouslymake it more complicated.


(*) http://xmlgraphics.apache.org/fop/0.95/intermediate.html#usage

http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/examples/embedding/java/embedding/ExampleXML2PDF.java?view=markup



HTH!

Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Editing Text in the Intermediate Format

Reply via email to