Hi, > > In which I would start by making a copy of html.cpp to add the corresponding > > PDF tag writeouts, using ExactImage > > (http://www.exactcode.de/site/open_source/exactimage/) > > for the actual PDF structure generation. (ExactImage SVN:HEAD only includes > > very static pure image writing, but I already rewrote that part and have any > > vector, font, image and multi-page writing in my local working copy, > > already). > > > > Any hints welcome, > > Exactimage is GPL code. Linking to it is legal but would contaminate > Cuneiform (which is BSD). For this reason I can't accept it into > trunk.
Yes, that came into my mind as well after the post. Guess I overlooked that part as I get in touch with BSD licensed code so seldomly :-) ... >The easiest way to get PDF output is to convert the RTF output to PDF. > I would imagine that there are already programs that do this. I'm also > looking into adding the layout information to the HTML exporter using > hOCR format. Having a hOCR -> PDF converter would probably be > beneficial outside Cuneiform as well. Yes I agree about the hOCR point. However, I think RTF will miss the exact positioning for a PDF writer to layer the text behind the image for the final PDF. I'll now add a hOCR (HTML) parser for the PDF writer of ExactImage, so that one can feed the formating stream with boundary boxes from "any" hOCR program and obtain a searchable PDF. _______________________________________________ Mailing list: https://launchpad.net/~cuneiform Post to : [email protected] Unsubscribe : https://launchpad.net/~cuneiform More help : https://help.launchpad.net/ListHelp

