Hi David, You're right, it's not pretty - actually it's quite ugly if you use adobe's export to RTF or export to Text - you lose a lot of information. This is why we went through the adobe PDF api to get at the information contained within a PDF document.
It is possible to extract information from PDF files and get useful XML output that can be used for populating docbook. But you have to make some assumptions, which vary depending on the type of document you are processing ( resume, chapter, article, web page, etc.). If you have a sense of this, then you can list the assumptions as sets of rules and process transformations based on these assumptions. Thanks, -Riz ------------------------------ Riz Virk, (617) 905-3518 [EMAIL PROTECTED], [EMAIL PROTECTED] http://www.xyztechnologies.com -----Original Message----- From: David Cramer [mailto:[EMAIL PROTECTED]] Sent: Tuesday, August 13, 2002 1:22 PM To: [EMAIL PROTECTED] Subject: RE: DOCBOOK: converting to docbook As a last resort, if the source files for the pdfs aren't available, recent versions of acrobat can save/export to rtf and text. Not a pretty sight tho. David > -----Original Message----- > From: Bob Stayton [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, August 13, 2002 11:38 AM > To: jonathon; [EMAIL PROTECTED] > Subject: Re: DOCBOOK: converting to docbook > For your PDF documents, I'd look for the source document > that generated the PDF. It is tough (impossible?) > to convert PDF. >
