[
https://issues.apache.org/jira/browse/PDFBOX-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16819286#comment-16819286
]
Tilman Hausherr commented on PDFBOX-4514:
-----------------------------------------
The {{synchronized(LOG)}} is because of weird effects when calling
{{ICC_Profile.getInstance()}} from several threads on linux, see PDFBOX-3267
and PDFBOX-3641 and
https://bugs.openjdk.java.net/browse/JDK-8058973
https://bugs.openjdk.java.net/browse/JDK-6986863
> inefficient use of synchronized in PDICCBased.java
> --------------------------------------------------
>
> Key: PDFBOX-4514
> URL: https://issues.apache.org/jira/browse/PDFBOX-4514
> Project: PDFBox
> Issue Type: Bug
> Reporter: Jason
> Priority: Minor
>
> PDICCBased.java uses synchronized with static variable, e.g. synchronized
> (LOG) . It doesn't look to me it really needs to do it this way. This is very
> inefficient when multiple threads process different PDF at the same time.
> Change it to synchronized (this) will improve the performance.
> [https://github.com/apache/pdfbox/blob/3b16f3b4f42c61dd5fe990c586f60465f83a8ef8/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/color/PDICCBased.java#L191]
> Sample code simulates multiple threads process different PDF at the same time:
>
> {code:java}
> public static void main(String[] args) throws IOException {
> for (int i = 0; i < 10; i++) { // just run multiple time
> doWork();
> }
> }
> private static void doWork() throws IOException {
> long startTime = System.currentTimeMillis();
> String pdfFilename = "<absolute path to your pdf file>"; // replace this
> with your test file
> System.setProperty("sun.java2d.cmm",
> "sun.java2d.cmm.kcms.KcmsServiceProvider");
> PDDocument document = PDDocument.load(new File(pdfFilename));
> List<PDDocument> pdfPages = new Splitter().split(document);
> Map<Integer, PDDocument> pdfPagesWithIndex = new HashMap<>();
> for (int i = 0; i < pdfPages.size(); i++) {
> pdfPagesWithIndex.put(i, pdfPages.get(i));
> }
> // multiple threads running in parallel
> pdfPagesWithIndex.entrySet().parallelStream().forEach(entry -> {
> try {
> processPDF(entry.getKey(), entry.getValue());
> } catch (Exception e) {
> System.out.println(e);
> }
> });
> System.out.println("Convertion time: " + (System.currentTimeMillis() -
> startTime));
> try {
> document.close();
> } catch (IOException ignored) {
> }
> }
> private static void processPDF(int index, PDDocument pdfPage) throws
> IOException {
> PDFRenderer renderer = new PDFRenderer(pdfPage);
> try {
> renderer.renderImageWithDPI(0, 180, ImageType.RGB);
> } catch (IOException e) {
> System.out.println(e);
> }
> try {
> pdfPage.close();
> } catch (IOException ignored) {
> }
> }
> {code}
> I observed by changing synchronized (LOG) to synchronized (this), the above
> code can have maybe 20-30% reduction in latency. If I do a thread dump, I can
> see many threads are blocked on synchronized (LOG).
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]