[ 
https://issues.apache.org/jira/browse/PDFBOX-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4514:
------------------------------------
    Affects Version/s: 2.0.15

> inefficient use of synchronized in PDICCBased.java
> --------------------------------------------------
>
>                 Key: PDFBOX-4514
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4514
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.15
>            Reporter: Jason
>            Priority: Minor
>
> PDICCBased.java uses synchronized with static variable, e.g. synchronized 
> (LOG) . It doesn't look to me it really needs to do it this way. This is very 
> inefficient when multiple threads process different PDF at the same time. 
> Change it to synchronized (this) will improve the performance.
> [https://github.com/apache/pdfbox/blob/3b16f3b4f42c61dd5fe990c586f60465f83a8ef8/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/color/PDICCBased.java#L191]
> Sample code simulates multiple threads process different PDF at the same time:
>  
> {code:java}
> public static void main(String[] args) throws IOException {
>   for (int i = 0; i < 10; i++) { // just run multiple time
>     doWork();
>   }
> }
> private static void doWork() throws IOException {
>   long startTime = System.currentTimeMillis();
>   String pdfFilename = "<absolute path to your pdf file>"; // replace this 
> with your test file
>   System.setProperty("sun.java2d.cmm", 
> "sun.java2d.cmm.kcms.KcmsServiceProvider");
>   PDDocument document = PDDocument.load(new File(pdfFilename));
>   List<PDDocument> pdfPages = new Splitter().split(document);
>   Map<Integer, PDDocument> pdfPagesWithIndex = new HashMap<>();
>   for (int i = 0; i < pdfPages.size(); i++) {
>     pdfPagesWithIndex.put(i, pdfPages.get(i));
>   }
>   // multiple threads running in parallel
>   pdfPagesWithIndex.entrySet().parallelStream().forEach(entry -> {
>     try {
>       processPDF(entry.getKey(), entry.getValue());
>     } catch (Exception e) {
>       System.out.println(e);
>     }
>   });
>   System.out.println("Convertion time: " + (System.currentTimeMillis() - 
> startTime));
>   try {
>     document.close();
>   } catch (IOException ignored) {
>   }
> }
> private static void processPDF(int index, PDDocument pdfPage) throws 
> IOException {
>   PDFRenderer renderer = new PDFRenderer(pdfPage);
>   try {
>     renderer.renderImageWithDPI(0, 180, ImageType.RGB);
>   } catch (IOException e) {
>     System.out.println(e);
>   }
>   try {
>     pdfPage.close();
>   } catch (IOException ignored) {
>   }
> }
> {code}
> I observed by changing synchronized (LOG) to synchronized (this), the above 
> code can have maybe 20-30% reduction in latency. If I do a thread dump, I can 
> see many threads are blocked on synchronized (LOG). 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to