On Friday 18 Jul 2014 16:35:11 John Palmer wrote:
> Terry, I've used pdftotext to good effect.
> pdftotext -layout will do a lot of what you want, but tabular matter is
> always a difficulty because it isn't a simple sequence of words.

Yes.  Tables and images were a problem when I tried pdftotext.
 
> You say 'because [the documents] need translating'; i.e. to another
> language or languages? or have I misunderstood?

Yes.  From French.

> Are there many illustrations and of what character?  Are they needed in
> the translated version(s) and if so will they need altering?

Yes.  But I'm more concerned with getting at the text inside the Tables 
without too much re-interpretation of what is what.

> There may be some mileage (and also some work) in converting the
> documents to a notation in which the logical structure (rather than the
> actual layout on the page) is indicated by mark-up.  I'm thinking mainly
> of LaTeX and its friends, though [X]HTML (used properly) has this
> character too.  Then all you have to do is to swap the text of each
> English paragraph or other unit of text (caption, for instance) for a
> Spanish (etc) text and the final layout adjusts itself to fit.
> If the target language is right-to-left or ideographic it's harder but
> still within the scope of TeX.

Yes.  As mentioned in my earlier post, I was able to get pdftohtml to do an 
excellent job.

-- 
        
        Terry Coles

        

-- 
Next meeting:  Bournemouth, Tuesday, 2014-08-05 20:00
Meets, Mailing list, IRC, LinkedIn, ...  http://dorset.lug.org.uk/
New thread on mailing list:  mailto:dorset@mailman.lug.org.uk
How to Report Bugs Effectively:  http://goo.gl/4Xue

Reply via email to