Re: [DISCUSS] Performance tests for PDFBox

Andreas Lehmkuehler Mon, 18 Oct 2021 22:43:58 -0700

Am 18.10.21 um 08:36 schrieb [email protected]:

Am Montag, dem 18.10.2021 um 07:26 +0200 schrieb Andreas Lehmkuehler:

Am 17.10.21 um 20:39 schrieb [email protected]:

Am Sonntag, dem 17.10.2021 um 12:45 +0200 schrieb Tilman Hausherr:

+1


Yes this should be done, although I don't really know how. Maybe
it's
just windows, or maybe it's just me, but I found it difficult to
get
reliable, reproducable benchmarks.


IMHO they are reproducable within the same environment and a
controlled
workload of the system they are running on. And they only provide a
baseline for further analysis. OTOH they should help us finding
larger
improvements as well as degredations.

If we use the same set of test files at least they will also help
us
looking at performance (and memory consumption btw) from a common
ground.

I agree with Maruan. Such tests don't qualify as deterministic cases
for the
build process but they may give some valuable results when looking
for
performance/resource issues.

What about starting with a rendering and/or text extraction test
and
take it from there. As noted I'd see that in an extra package
similar
to examples so we can run it on a case by case basis.

We should add the save test compressed/uncompressed as well.


I already have that as part of PDFBOX-5286 locally.

If you're fine with it I'll create a new performance subproject and add
the stuff I have. Will be handled by a new ticket.

WDYT?

I'm ok with that.

Andreas


BR
Maruan


Andreas


BR
Maruan


Tilman

Am 14.10.2021 um 21:21 schrieb [email protected]:

Hi,

given that there is PDFBOX-5286, first noted in PDFBOX-5068,
and we
also see variations in performance between releases creating a
testbed
for performance testing came to my mind. I did some very basic
tests
using JMH some of which are note in above tickets.

What about formalizing that? Similar to testing done by the
Tika
colleagues when it comes to text extraction.

Cases I see are around parsing, saving, rendering and text
extraction
and some basic workflows such as filling a form field.

As runtime will differ between different environments it might
be
worth
creating an extra subproject for that and run that as needed.
We
can
take the numbers from the first run and create a baseline file
from
that if we'd like to have some kind of automated comparison...

Having some common test will help us finding regressions
earlier
and
also help testing enhancements against a defined  set of files.
This
would complement the functionality based tests we have and also
the
larger test runs done for text extraction and rendering.

WDYT?

Maruan Sahyoun


---------------------------------------------------------------
----
--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



-----------------------------------------------------------------
----
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [DISCUSS] Performance tests for PDFBox

Reply via email to