[ 
https://issues.apache.org/jira/browse/PDFBOX-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630527#comment-14630527
 ] 

John Hewson commented on PDFBOX-2882:
-------------------------------------

{quote}
I chose the usage of AtomicBoolean and the synchronization deliberately to 
minimize synchronization overhead and to minimize blocking of parallel access 
{quote}

But you're synchronising on every single successful read or write, so you're 
already paying the cost where it hurts most. Why add significant complexity to 
avoid one single lock in the case where the file is already closed() and an 
exception is thrown? Given that exceptions are very expensive, do you really 
need to save a microsecond prior to throwing one, given that it comes at such a 
large code-complexity cost?

{code}
The nice thing with AtomicBoolean is that it does not use 'normal' 
synchronization and comes with lower overhead. Thus where possible I use this, 
e.g. for 'isClosed'-checking so this is more lightweight as synchronizing the 
whole method.
{code}

Not if you immediately follow it with a synchronized block, as always happens 
in ScratchFile, because every successful read (i.e. all of them, in valid code) 
pays the full cost of synchronisation. Reading from a closed file is almost 
never going to happen in valid code, yet you're trying to optimize for that 
case and introducing a lot of complexity. AtomicBoolean is cheap but it's not 
free, so you're actually adding overhead for every call that doesn't result in 
an exception.

{quote}
Additionally if the page is in-memory the read/write operation does not need 
any further synchronization which means it can run in parallel providing very 
fast parallel page access to in-memory pages. Only the file access needs to be 
synchronized. In principle the file I/O is the operation which will take most 
of the time thus it is crucial that such an operation does not block any other 
parallel page access not using the file.
{quote}

That's certainly true, you wouldn't want a synchronised method doing expensive 
computation prior to accessing the resource which actually requires the 
synchronization. However, in ScratchFile it's simply a few boolean and 
arithmetic comparisons, which are very fast, so I doubt that the difference in 
ScratchFile would even be measurable.

> Improve performance when using scratch file
> -------------------------------------------
>
>                 Key: PDFBOX-2882
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2882
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing
>    Affects Versions: 2.0.0
>            Reporter: Timo Boehme
>            Assignee: Timo Boehme
>            Priority: Minor
>         Attachments: ScratchFile.java, ScratchFileBuffer.java
>
>
> The current scratch file implementation uses many direct I/O calls which 
> slows down parsing compared with in-memory scratch buffer considerably.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to