The text stripper utilities have a structure that would allow you to write a
little bit of code that could do this if you externally define the
positional ranges for the different columns in your PDF.

This requires moderate programming skills to succeed.  Is that in-scope for
you?

On Wed, Feb 10, 2010 at 6:35 AM, Martinez, Mel - 1004 - MITLL <
[email protected]> wrote:

> Unfortunately, the pdf->html conversion does not convert pdf tables to html
> tables.
>
> The pdf->html utility simply takes the output of the PDF text extraction
> process and wraps it in some simple html tags so that you can view the
> extracted text in a web browser.  It does not preserve table structures
> other than as raw extracted text (assuming the table is embedded as text
> and
> not as an image).  It also does not preserve images - though those can be
> extracted separately and in theory re-combined with the html by hand.
>
> -----Original Message-----
> From: Daniel Wilson [mailto:[email protected]]
> Sent: Wednesday, February 10, 2010 8:53 AM
> To: [email protected]
> Subject: Re: Question
>
> Jake, I'm really not sure PDFBox will help with that.
>
> But ... maybe.
>
> You could try the PDFBox utility that converts PDF's to HTML.  If your
> PDF's
> are coming out as tables in HTML ... then ... it would not be a long step
> to
> derive a utility that outputs the XML format that Excel 2007 supports.
>
> As far as vlookups, I really don't know.
>
> Daniel Wilson
>
> On Tue, Feb 9, 2010 at 10:30 PM, Jake Shin <[email protected]> wrote:
>
> >
> > Hi,
> >
> >
> >
> > I was wondering if this will help to read a pdf file and put it into MS
> > Excel format that I want.  Also, I have to do vlookups for 4-5 columns in
> > Excel after I put values.  Is this possible?
> >
> >
> >
> > Kindly,
> >
> >
> >
> > _________________________________________________________________
> > Check your Hotmail from your phone.
> > http://go.microsoft.com/?linkid=9708121
> >
>



-- 
Ted Dunning, CTO
DeepDyve

Reply via email to