Re: [fw-general] Extracting data out of PDF with Zend_Pdf?

Markus Fischer Thu, 30 Aug 2007 16:14:27 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I just discovered another need ... however I think this won't easily
implemented.


Currently the complete PDF needs to be parsed into memory, even all I
want from a PDF is the metadata information.

Would it be possible to implement a smart way to extract metadata
information without parsing everything into memory ... ?

Some PDF files I tested needed more then 128M of memory to be parsed
even all I need is Title and Author ... and besides memory it also takes
quite some time, too.

thanks,
- - Markus

Markus Fischer wrote:
> Hey!
> 
> This is great, I just saw your commit and tested it. I saw the API being
> changed :
> 
> * $oPdf->properties is now a property, not a method anymore
> * $oPdf->getMetaData() returns some xml rdf sequence
> 
> I tested it with quite some PDFs and it worked very well. I also
> realized that the amount of information in the properties can vary, some
> have a "Title", others don't.
> 
> Is there a difference in practice between the distilled information
> through the properties property and the RDF data?
> 
> thank you!
> - Markus
> 
> Alexander Veremyev wrote:
>> Hi Markus,
> 
>> Thanks for the offered help!
> 
>> I mentioned JIRA issue only to indicate that feature already was
>> requested. So it increases its chances to be done in a short time :)
>> Actually I am going to take a look into it and determine plans for it
>> tomorrow.
> 
>> With best regards,
>>    Alexander Veremyev.
> 
>>> -----Original Message-----
>>> From: Markus Fischer [mailto:[EMAIL PROTECTED] 
>>> Sent: Monday, August 27, 2007 11:54 PM
>>> To: Alexander Veremyev
>>> Cc: Zend Framework General
>>> Subject: Re: [fw-general] Extracting data out of PDF with Zend_Pdf?
>>>
>> Hi Alexander,
> 
>> thank you for answering so quickly. I'll search JIRA next time.
> 
>> I'm not new to PHP but the PDF spec is quite complex so is 
>> the PDF implementation ... unfortunately I've not enough time 
>> to dig into, I'ld love to help and come up with a patch.
> 
>> So I hope it will get implemented soon, this would really be great.
> 
>> thanks,
>> - Markus
> 
>> Alexander Veremyev wrote:
>>>>> Hi  Markus,
>>>>>
>>>>> PDF properties processing is planned
>>>>> (http://framework.zend.com/issues/browse/ZF-294), but not done yet.
>>>>>
>>>>> It's not the first request for the feature and implementation is 
>>>>> relatively simple. I think it should be done in the near future.
>>>>>
>>>>>
>>>>> With best regards,
>>>>>    Alexander Veremyev.
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Markus Fischer [mailto:[EMAIL PROTECTED]
>>>>>> Sent: Sunday, August 26, 2007 10:37 PM
>>>>>> To: Zend Framework General
>>>>>> Subject: [fw-general] Extracting data out of PDF with Zend_Pdf?
>>>>>>
>>>>> Hi,
>>>>>
>>>>> is it supported to extra metadata information from a PDF? The 
>>>>> information I'm seeking is
>>>>> * title
>>>>> * number of pages
>>>>> * author
>>>>>
>>>>> (of course as long as the information is contained in the PDF).
>>>>>
>>>>> I've gone through quite some PDFs where Adobes Reader shows 
>> me title 
>>>>> and author information but from Zend_Pdf I get nothing back.
>>>>>
>>>>> Following the documentation I thought I can get this 
>> information from 
>>>>> the properties() method, e.g.
>>>>>
>>>>> $oPdf = Zend_Pdf::load($sFile);
>>>>> var_dump( $oPdf->properties() );
>>>>>
>>>>> But the returned array was empty in all cases.
>>>>>
>>>>> I know I can get the number of pages by counting the "pages" 
>>>>> property, but what about the other information?
>>>>>
>>>>> If it's not possible with Zend_Pdf, although off-topic, what other 
>>>>> possibilities are out there? fpdf? Or some unix commands (I'm on 
>>>>> Linux)?
>>>>>
>>>>> thanks,
>>>>> - Markus
>>>>>
>>>>> ps: I was using 1.0.1
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG109Q1nS0RcInK9ARAmoPAJsGXp8DuD72lFpirddPV6WLX3ke8ACgqF5I
7glEVrmvYgZxIJEf3HGeEg8=
=Emla
-----END PGP SIGNATURE-----

Re: [fw-general] Extracting data out of PDF with Zend_Pdf?

Reply via email to