Thanks to everyone. I already gave up on this one. I used the XPDF's tools which exports to text and exports images ...
Unfortunately, the client was using the "Amyuni Document Converter" to generate the PDF files that exporting the text was making it unusable and exporting to image was impossible since the attached images weren't marked as such. So the exporter "exports" the whole page. I did submit my recommendation of convincing the client to send the data to us in a more coherent format ... Thanks again. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of bill lam Sent: Thursday, January 29, 2009 11:28 AM To: [email protected] Subject: Re: [Jgeneral] Parsing PDF documents in J On Wed, 28 Jan 2009, Alex Rufon wrote: > I'll look into what openoffice is doing. :) Sorry, my memory did not serve me. OpenOffice can export in pdf but not parsing pdf into text. You should use pdftext as suggested by John. If that program does not run under window, I guess you may install cygwin to run it. You should always double check by comparing with the original pdf because the text extracted can be in random ordering otherwise it is a very good chance that you are out of job. It was sometime ago that I used OOo to batch convert all m$ word document to OOo odt format, and later converted all odt to latex format also using OOo. -- regards, ==================================================== GPG key 1024D/4434BAB3 2008-08-24 gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 唐詩036 王昌齡 塞上曲 蟬鳴空桑林 八月蕭關道 出塞復入塞 處處黃蘆草 從來幽并客 皆向沙場老 莫學遊俠兒 矜誇紫騮好 ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
