RE: [fw-general] Extracting data out of PDF with Zend_Pdf?

Alexander Veremyev Fri, 31 Aug 2007 13:01:18 -0700

Hi Markus,

Great thanks for the testing!


That looks it would be a good feature to have "info only" PDF loading
mode.

Number of document pages is calculated dinamically now. Pages structure
is usually a tree with pages at leafs. So it's necessary to load each
tree element to check if it's a page node or pages agregation node. That
provokes complete page data loading.

It's also possible to get page numbers without actual tree processing.
Root pages tree node contains number of leafs under it (== number of
pages).
It could be retrieved in context of Zend_Pdf object by the following
expression:
-------------------------------
$this->_trailer->Root->Pages->Count->value
----------

I am thinking about what is the best API for retrieving page numbers
using this way...


With best regards,
   Alexander Veremyev.


> -----Original Message-----
> From: Markus Fischer [mailto:[EMAIL PROTECTED] 
> Sent: Friday, August 31, 2007 9:51 PM
> To: Alexander Veremyev
> Cc: Zend Framework General
> Subject: Re: [fw-general] Extracting data out of PDF with Zend_Pdf?
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi,
> 
> Alexander Veremyev wrote:
> > Zend_Pdf preloads PDF objects reference tables and pages. Both 
> > operations take enough time and memory.
> > 
> > I think pages loading may be omitted for some cases and it 
> may save a 
> > lot of resources, but it should be tested. Could I ask you 
> to do this?
> > :)  (It looks you have a good set of "real world" PDF 
> examples) Please 
> > comment line 294 of library/Zend/Pdf.php file (current SVN
> > version):
> > ---------------------------------------------
> > //                $this->_loadPages($this->_trailer->Root->Pages);
> > ---------------
> > Note: $pdf->pages array will be empty.
> 
> 
> I tested it and it worked quite well. It's much faster, of 
> course, and memory consumption is more conservative.
> 
> But the number of pages (current I get this information only with
> count($pdf->pages) ) is one of the important meta data 
> information I would need to know about a PDF. Is there a 
> chance to get the number of pages without parsing the 
> complete PDF into memory?
> 
> thanks,
> - - Markus
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (MingW32)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iD8DBQFG2FT01nS0RcInK9ARAgdgAJsHqQE5TUthP8A6W2JTlv7QoMkiMgCgkzuu
> ZKTHVeRe5EJHHovV1sn1z70=
> =PHGU
> -----END PGP SIGNATURE-----
>

RE: [fw-general] Extracting data out of PDF with Zend_Pdf?

Reply via email to