Conal Tuohy wrote:

Antonio Fiol Bonn�n wrote:


a) Refactoring SimpleLuceneXMLIndexerImpl so that its private method
indexDocument is not private, and taking it to an external component.

b) Creating a PDFGenerator (in the cocoon sense of generator, of course).

Option (a) seems to be giving us more headaches than pleasure, and
option (b) seems cleaner to a certain point. Option (b) would allow to
follow links in the PDF file, if developed to that point.


I like option (b) too. You could start with plain text, but it could later be 
developed to extract basic formatting, hyperlinks, bookmarks (the table of contents), 
images, etc.


However, option (b) implies choosing a format for its output (which?),


An interesting question. Perhaps html, and begin with an implementation which produces:

<html>
<head/>
<body>
blah blah blah<br/>
blah blah<br/>
<br class="page"/>
... </body>
</html>

Later you (or someone else) could add extra things as they need them.

Alternatively, you could use a more PDF-oriented DTD.

What about using XSL:FO? Would be pretty cool to have the ability to transform PDF into FO, basically reversing what FOP does. I know it would be pretty painful to make it work with all kinds of PDF, but for reasonable ones it shouldn't be that hard (PDF is sort of a markup language already).

--
Stefano.


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to