[ 
https://issues.apache.org/jira/browse/PDFBOX-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292920#comment-13292920
 ] 

Kevin Jackson commented on PDFBOX-1337:
---------------------------------------

A simpler fix is to use ConcurrentHashMap:

### Eclipse Workspace Patch 1.0
#P pdfbox
Index: pdfbox/src/main/java/org/apache/pdfbox/util/PDFOperator.java
===================================================================
--- pdfbox/src/main/java/org/apache/pdfbox/util/PDFOperator.java        
(revision 1324793)
+++ pdfbox/src/main/java/org/apache/pdfbox/util/PDFOperator.java        
(working copy)
@@ -16,9 +16,8 @@
  */
 package org.apache.pdfbox.util;
 
-import java.util.Collections;
-import java.util.HashMap;
 import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
 
 /**
  * This class represents an Operator in the content stream.
@@ -32,7 +31,7 @@
     private byte[] imageData;
     private ImageParameters imageParameters;
 
-    private static Map operators = Collections.synchronizedMap( new HashMap() 
);
+    private static Map<String, PDFOperator> operators = new 
ConcurrentHashMap<String, PDFOperator>();
 
     /**
      * Constructor.

                
> Improve PDFOperator performance on multithreading environment
> -------------------------------------------------------------
>
>                 Key: PDFBOX-1337
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1337
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, Utilities
>    Affects Versions: 1.6.0
>            Reporter: Alexis
>         Attachments: thread_dump_pdfbox_1.6.0_PDFOperator_BLOCKED.txt
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> With more than 6 threads, the API PDFOperator#getOperator(String operator) is 
> still blocked :
> Sample with 48 threads :
> pool-1-thread-46" - Thread t@72
>    java.lang.Thread.State: RUNNABLE
>       at org.apache.pdfbox.util.PDFOperator.getOperator(PDFOperator.java:76)
>       at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:441)
>       at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:46)
>       at 
> org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:175)
>       at 
> org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:187)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:266)
> I propose to remove the synchronization of the attribute "operators" and set 
> up a synchronization 
> on the put operation. (This optimization saves 30 percent of time)
> public class PDFOperator
> {
>     [...]
>     // private static Map operators = Collections.synchronizedMap( new 
> HashMap() );
>     private static Map operators = new HashMap();
>     [...]
>     public static PDFOperator getOperator( String operator )
>     {
>         PDFOperator operation = null;
>         if( operator.equals( "ID" ) || operator.equals( "BI" ) )
>         {
>             //we can't cache the ID operators.
>             operation = new PDFOperator( operator );
>         }
>         else
>         {
>             operation = (PDFOperator)operators.get(operator);
>             if( operation == null )
>             {
>               synchronized (operators) {
>                 operation = (PDFOperator)operators.get(operator);
>                 if ( operation == null ) {
>                   operation = new PDFOperator( operator );
>                   operators.put( operator, operation );
>                 }
>               }
>             }
>         }
>         return operation;
>     }
>     [...]
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to