Thanks to everyone.

I already gave up on this one. I used the XPDF's tools which exports to text 
and exports images ...

Unfortunately, the client was using the "Amyuni Document Converter" to generate 
the PDF files that exporting the text was making it unusable and exporting to 
image was impossible since the attached images weren't marked as such. So the 
exporter "exports" the whole page.

I did submit my recommendation of convincing the client to send the data to us 
in a more coherent format ... 

Thanks again.


-----Original Message-----
From: [email protected] [mailto:[email protected]] On 
Behalf Of bill lam
Sent: Thursday, January 29, 2009 11:28 AM
To: [email protected]
Subject: Re: [Jgeneral] Parsing PDF documents in J

On Wed, 28 Jan 2009, Alex Rufon wrote:
> I'll look into what openoffice is doing. :)

Sorry, my memory did not serve me. OpenOffice can export in pdf but
not parsing pdf into text.  You should use pdftext as suggested by
John.  If that program does not run under window, I guess you may
install cygwin to run it. You should always double check by comparing
with the original pdf because the text extracted can be in random
ordering otherwise it is a very good chance that you are out of job.

It was sometime ago that I used OOo to batch convert all m$ word
document to OOo odt format,  and later converted all odt to latex
format also using OOo.

-- 
regards,
====================================================
GPG key 1024D/4434BAB3 2008-08-24
gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
唐詩036 王昌齡  塞上曲
    蟬鳴空桑林  八月蕭關道  出塞復入塞  處處黃蘆草
    從來幽并客  皆向沙場老  莫學遊俠兒  矜誇紫騮好
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to