Re: [Cuneiform] PDF output

René Rebe Thu, 04 Sep 2008 02:34:01 -0700

Hi,

> > In which I would start by making a copy of html.cpp to add the corresponding
> > PDF tag writeouts, using ExactImage
> > (http://www.exactcode.de/site/open_source/exactimage/)
> > for the actual PDF structure generation. (ExactImage SVN:HEAD only includes
> > very static pure image writing, but I already rewrote that part and have any
> > vector, font, image and multi-page writing in my local working copy, 
> > already).
> >
> > Any hints welcome,
>
> Exactimage is GPL code. Linking to it is legal but would contaminate
> Cuneiform (which is BSD). For this reason I can't accept it into
> trunk.


Yes, that came into my mind as well after the post. Guess I overlooked
that part as I get in touch with BSD licensed code so seldomly :-)

...

>The easiest way to get PDF output is to convert the RTF output to PDF.
> I would imagine that there are already programs that do this. I'm also
> looking into adding the layout information to the HTML exporter using
> hOCR format. Having a hOCR -> PDF converter would probably be
> beneficial outside Cuneiform as well.

Yes I agree about the hOCR point. However, I think RTF will miss the
exact positioning for a PDF writer to layer the text behind the image
for the final PDF.

I'll now add a hOCR (HTML) parser for the PDF writer of ExactImage,
so that one can feed the formating stream with boundary boxes from
"any" hOCR program and obtain a searchable PDF.

_______________________________________________
Mailing list: https://launchpad.net/~cuneiform
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~cuneiform
More help   : https://help.launchpad.net/ListHelp

Re: [Cuneiform] PDF output

Reply via email to