Terry, I've used pdftotext to good effect.
pdftotext -layout will do a lot of what you want, but tabular matter is
always a difficulty because it isn't a simple sequence of words.

You say 'because [the documents] need translating'; i.e. to another
language or languages? or have I misunderstood?
Are there many illustrations and of what character?  Are they needed in
the translated version(s) and if so will they need altering?

There may be some mileage (and also some work) in converting the
documents to a notation in which the logical structure (rather than the
actual layout on the page) is indicated by mark-up.  I'm thinking mainly
of LaTeX and its friends, though [X]HTML (used properly) has this
character too.  Then all you have to do is to swap the text of each
English paragraph or other unit of text (caption, for instance) for a
Spanish (etc) text and the final layout adjusts itself to fit.
If the target language is right-to-left or ideographic it's harder but
still within the scope of TeX.

I've read about pandoc and I'll be interested to hear from someone who's
tried it.  I have mixed feelings about markdown.
Good luck.
John

-- 
John Palmer
Preston near Weymouth, Dorset, England
e-mail:  [email protected] (plain text preferred)
website: http://www.palmyra.me.uk/


-- 
Next meeting:  Bournemouth, Tuesday, 2014-08-05 20:00
Meets, Mailing list, IRC, LinkedIn, ...  http://dorset.lug.org.uk/
New thread on mailing list:  mailto:[email protected]
How to Report Bugs Effectively:  http://goo.gl/4Xue

Reply via email to