-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I just discovered another need ... however I think this won't easily implemented.
Currently the complete PDF needs to be parsed into memory, even all I want from a PDF is the metadata information. Would it be possible to implement a smart way to extract metadata information without parsing everything into memory ... ? Some PDF files I tested needed more then 128M of memory to be parsed even all I need is Title and Author ... and besides memory it also takes quite some time, too. thanks, - - Markus Markus Fischer wrote: > Hey! > > This is great, I just saw your commit and tested it. I saw the API being > changed : > > * $oPdf->properties is now a property, not a method anymore > * $oPdf->getMetaData() returns some xml rdf sequence > > I tested it with quite some PDFs and it worked very well. I also > realized that the amount of information in the properties can vary, some > have a "Title", others don't. > > Is there a difference in practice between the distilled information > through the properties property and the RDF data? > > thank you! > - Markus > > Alexander Veremyev wrote: >> Hi Markus, > >> Thanks for the offered help! > >> I mentioned JIRA issue only to indicate that feature already was >> requested. So it increases its chances to be done in a short time :) >> Actually I am going to take a look into it and determine plans for it >> tomorrow. > >> With best regards, >> Alexander Veremyev. > >>> -----Original Message----- >>> From: Markus Fischer [mailto:[EMAIL PROTECTED] >>> Sent: Monday, August 27, 2007 11:54 PM >>> To: Alexander Veremyev >>> Cc: Zend Framework General >>> Subject: Re: [fw-general] Extracting data out of PDF with Zend_Pdf? >>> >> Hi Alexander, > >> thank you for answering so quickly. I'll search JIRA next time. > >> I'm not new to PHP but the PDF spec is quite complex so is >> the PDF implementation ... unfortunately I've not enough time >> to dig into, I'ld love to help and come up with a patch. > >> So I hope it will get implemented soon, this would really be great. > >> thanks, >> - Markus > >> Alexander Veremyev wrote: >>>>> Hi Markus, >>>>> >>>>> PDF properties processing is planned >>>>> (http://framework.zend.com/issues/browse/ZF-294), but not done yet. >>>>> >>>>> It's not the first request for the feature and implementation is >>>>> relatively simple. I think it should be done in the near future. >>>>> >>>>> >>>>> With best regards, >>>>> Alexander Veremyev. >>>>> >>>>>> -----Original Message----- >>>>>> From: Markus Fischer [mailto:[EMAIL PROTECTED] >>>>>> Sent: Sunday, August 26, 2007 10:37 PM >>>>>> To: Zend Framework General >>>>>> Subject: [fw-general] Extracting data out of PDF with Zend_Pdf? >>>>>> >>>>> Hi, >>>>> >>>>> is it supported to extra metadata information from a PDF? The >>>>> information I'm seeking is >>>>> * title >>>>> * number of pages >>>>> * author >>>>> >>>>> (of course as long as the information is contained in the PDF). >>>>> >>>>> I've gone through quite some PDFs where Adobes Reader shows >> me title >>>>> and author information but from Zend_Pdf I get nothing back. >>>>> >>>>> Following the documentation I thought I can get this >> information from >>>>> the properties() method, e.g. >>>>> >>>>> $oPdf = Zend_Pdf::load($sFile); >>>>> var_dump( $oPdf->properties() ); >>>>> >>>>> But the returned array was empty in all cases. >>>>> >>>>> I know I can get the number of pages by counting the "pages" >>>>> property, but what about the other information? >>>>> >>>>> If it's not possible with Zend_Pdf, although off-topic, what other >>>>> possibilities are out there? fpdf? Or some unix commands (I'm on >>>>> Linux)? >>>>> >>>>> thanks, >>>>> - Markus >>>>> >>>>> ps: I was using 1.0.1 > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG109Q1nS0RcInK9ARAmoPAJsGXp8DuD72lFpirddPV6WLX3ke8ACgqF5I 7glEVrmvYgZxIJEf3HGeEg8= =Emla -----END PGP SIGNATURE-----
