Hi,

as I see it (had only a quick look at the implementation) the ScratchFileBuffer implementation is not optimal for fast random access. Single writes of bytes are not buffered but directly written to the file - a lot of I/O-operations) and seek operations have to travel the linked page list reading some bytes of each page - again a lot of seek and read I/O-operations. To speed things up it is crucial to minimize the number of I/O-operations directly going to the random access file. Therefore it is needed to buffer writes, keep last read page in memory for sequential reads and have an in-memory cache of page meta data (offset, link to previous/next page).


Best,
Timo


Am 14.07.2015 um 12:15 schrieb Manfred Pock:
Yes, the input is a inputstream. I can try it direct from file.

But in general we get the pdf from an document management system as stream.
Does make sense that i save the pdf to file before?

Why is there so an big performance difference beetween the version from
May and the current version, if we use it with useScratchFiles = true ?

regarts, Manfred

Am 14.07.2015 um 12:02 schrieb Andreas Lehmkühler:
Hi,

Manfred Pock <pock.manf...@gmail.com> hat am 14. Juli 2015 um 11:39
geschrieben:


Ok, we load the pdf with useScratchFiles = true, if we load them with
false the performance is better, but a little bit slower than the old
one.
What do you use as input, a stream or a real file? If the latter you
should use
the load method with the file parameter.

PDFBox needs ramdom access to the pdf and if a stream is provided
PDFBox copies
the data to a file (lower memory usage, slower performance) or to the
memory
(higher memory usage, better performance).

BR
Andreas


But now it need more memory. I cannot load some pdfs with the current
version with the same java-memory configuration.

Am 14.07.2015 um 11:26 schrieb Manfred Pock:
Hi,

we use the Pdfbox-trunkversion to render pdf's, currently we use the
version from 12. May 2015.

Today i have done an update to the current version and have test it.
It seems to be that it need now much more time to render pdf's, it
depends of the size of the pdf.

for example you can try this one:

http://cloud.directupload.net/15bu

It need five times more then the version from May 2015.

regarts, Manfred
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4      | fax: +49 345 478 047 1
email: ulf.la...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824
managing director : Lutz Weber


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to