Re: [google-appengine] Extract images from pdf

Vinny P Mon, 07 Apr 2014 19:26:25 -0700

On Mon, Apr 7, 2014 at 6:27 AM, rishidude <[email protected]> wrote:


> I am writing an application where a pdf file can be uploaded and the user
> will be shown some details of the pdf file(some text and images[png]) upon
> searching for the id of the pdf.
>
> Now, in ubuntu, I am using pdfminer to convert to text and pdfimages to
> extract images. Now, here is the problem. In google app engine I am able to
> import pdfminer and use it for text extraction. But, I have not been able
> to extract images. pdfimages is available in ubuntu, but not as a python
> package.
>
> Please help.
>


It depends on the image you're trying to extract. If you're attempting to
extract JPEG images, usually you can extract the image data from the raw
PDF. Here's an example: http://stackoverflow.com/a/2695387

If the image is stored in a different manner, you may have to run
*pdfimages* within a Compute Engine machine or Managed VMs and handling the
extraction there.


-----------------
-Vinny P
Technology & Media Advisor
Chicago, IL

App Engine Code Samples: http://www.learntogoogleit.com

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/d/optout.

Re: [google-appengine] Extract images from pdf

Reply via email to