|
Alex,
How is your XSLT, I seem to be seeing this
all over the place today? There are pressos at the coming CFUN04 by Michael
Dinowitz and April Fleming on working with it, one of the outputs of FOP is to
txt or RTF
in the process it must convert the doc at
some stage to XSL-FO using XSLT, I`m assuming the doc can be converted by FOP
then to pdf or rtf or text, I may be incorrect in assuming this and it might not
be possible as you suggest, that is that you can step back with the pdf to
XSL-FO then use the engine to output to txt or RTF,...working with this before you probably know more about
it than I:)
For sure though if you can find a way to
change the doc into xml then afaik it is fairly straight forward to
use an XSLT transformation to do what you
want. Apparently its the in thing to grab anything in xml whether its emails
from google using xml or search applications, grab the xml apply XSLT and you
can do anything with it, put it into a database, make a rtf of it
"FOP uses the
standard XSL-FO file format as input, lays the content out into pages, then
renders it to the requested output. One great advantage to using XSL-FO as input
is that XSL-FO is itself an XML file, which means that it can be conveniently
created from a variety of sources. The most common method is to convert semantic
XML to XSL-FO, using an XSLT transformation."
ot Microblast
html2text is excellent but you need pdf to text:)
I`m looking forward to CFUN04:)
Colm
|
- [ cf-dev ] PDF 2 Text only Alex Skinner
- RE: [ cf-dev ] PDF 2 Text only Colm Brazel
- RE: [ cf-dev ] PDF 2 Text only Alex Skinner
- Re: [ cf-dev ] PDF 2 Text only Justin MacCarthy
- RE: [ cf-dev ] PDF 2 Text only Colm Brazel
- RE: [ cf-dev ] PDF 2 Text only Colm Brazel
- RE: [ cf-dev ] PDF 2 Text only Colm Brazel
- RE: [ cf-dev ] PDF 2 Text only Colm Brazel
- RE: [ cf-dev ] PDF 2 Text only Stephen Milligan
- RE: [ cf-dev ] PDF 2 Text only Stephen Milligan
- RE: [ cf-dev ] PDF 2 Text only Colm Brazel
- RE: [ cf-dev ] PDF 2 Text only Stephen Milligan
- [ cf-dev ] at last.... Justin MacCarthy
- RE: [ cf-dev ] PDF 2 Text only duncan . cumming
