Terry, I've used pdftotext to good effect. pdftotext -layout will do a lot of what you want, but tabular matter is always a difficulty because it isn't a simple sequence of words.
You say 'because [the documents] need translating'; i.e. to another language or languages? or have I misunderstood? Are there many illustrations and of what character? Are they needed in the translated version(s) and if so will they need altering? There may be some mileage (and also some work) in converting the documents to a notation in which the logical structure (rather than the actual layout on the page) is indicated by mark-up. I'm thinking mainly of LaTeX and its friends, though [X]HTML (used properly) has this character too. Then all you have to do is to swap the text of each English paragraph or other unit of text (caption, for instance) for a Spanish (etc) text and the final layout adjusts itself to fit. If the target language is right-to-left or ideographic it's harder but still within the scope of TeX. I've read about pandoc and I'll be interested to hear from someone who's tried it. I have mixed feelings about markdown. Good luck. John -- John Palmer Preston near Weymouth, Dorset, England e-mail: [email protected] (plain text preferred) website: http://www.palmyra.me.uk/ -- Next meeting: Bournemouth, Tuesday, 2014-08-05 20:00 Meets, Mailing list, IRC, LinkedIn, ... http://dorset.lug.org.uk/ New thread on mailing list: mailto:[email protected] How to Report Bugs Effectively: http://goo.gl/4Xue

