I just tried it. Each line of the extracted text is wrapped in a paragraph tag, and each page is wrapped in div tags. That's all. No other HTML tags are used.

<p>Installing Java DB
</p>
<p>Java DB is installed automatically as part of the Java SE Development Kit (JDK).
</p>
<p>To obtain the JDK, navigate your web browser to
</p>
<p>http://www.oracle.com/technetwork/java/javase/downloads/ and click the Download JDK
</p>
<p>button. Follow the instructions on subsequent pages.
</p>

Kim

On 01/10/12 12:15 PM, Andrew McIntyre wrote:
On Tue, Jan 10, 2012 at 5:46 AM, Rick Hillegas<[email protected]>  wrote:

I ran a quick experiment: I removed fo2html.xsl and verified that I could
build the frames html docs. Here are some solutions listed in declining
order of effort:

<snip options>

Thanks,
-Rick

Another option would be to use PDFBox's ExtractText utility to convert
the PDFs generated by the FOP into HTML:

http://pdfbox.apache.org/commandlineutilities/ExtractText.html

I haven't tried it yet, so I can't speak to its accuracy or
presentation, but it would be another easy solution, and its
definitely licensed with the Apache License. :-)

- andrew

Reply via email to