Unfortunately, the pdf->html conversion does not convert pdf tables to html
tables.

The pdf->html utility simply takes the output of the PDF text extraction
process and wraps it in some simple html tags so that you can view the
extracted text in a web browser.  It does not preserve table structures
other than as raw extracted text (assuming the table is embedded as text and
not as an image).  It also does not preserve images - though those can be
extracted separately and in theory re-combined with the html by hand.

-----Original Message-----
From: Daniel Wilson [mailto:[email protected]] 
Sent: Wednesday, February 10, 2010 8:53 AM
To: [email protected]
Subject: Re: Question

Jake, I'm really not sure PDFBox will help with that.

But ... maybe.

You could try the PDFBox utility that converts PDF's to HTML.  If your PDF's
are coming out as tables in HTML ... then ... it would not be a long step to
derive a utility that outputs the XML format that Excel 2007 supports.

As far as vlookups, I really don't know.

Daniel Wilson

On Tue, Feb 9, 2010 at 10:30 PM, Jake Shin <[email protected]> wrote:

>
> Hi,
>
>
>
> I was wondering if this will help to read a pdf file and put it into MS
> Excel format that I want.  Also, I have to do vlookups for 4-5 columns in
> Excel after I put values.  Is this possible?
>
>
>
> Kindly,
>
>
>
> _________________________________________________________________
> Check your Hotmail from your phone.
> http://go.microsoft.com/?linkid=9708121
>

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to