On Friday 18 Jul 2014 15:35:58 Terry Coles wrote:
> On Friday 18 Jul 2014 15:02:50 Andrew Montgomery-Hurrell wrote:
> > You can try pdftohtml[1] to get it into HTML format, from there it should
> > be easier to convert into a document format you want using something like
> > pandoc[2].
> >
> > [1]: http://pdftohtml.sourceforge.net/
> > [2]: http://johnmacfarlane.net/pandoc/README.html
>
> That was incredibly fast :-)
>
> Unfortunately, for some reason the conversion inserted lots of unprintable
> chars; eg :
>
> PRÉAMBULE
>
> pdftotext did a better job of accurately converting the text, but lost all
> the formatting :-(
What worked was:
pdftohtml -c -s <pdffile>
It did a lovely job of retaining both the formatting and the text. When I get
to work on Monday, I'll see if the translation company can work with the HTML.
If not, I'll see what happens if I open the HTML in Word (LibreOffice
demolished
it).
--
Terry Coles
--
Next meeting: Bournemouth, Tuesday, 2014-08-05 20:00
Meets, Mailing list, IRC, LinkedIn, ... http://dorset.lug.org.uk/
New thread on mailing list: mailto:[email protected]
How to Report Bugs Effectively: http://goo.gl/4Xue