[
https://issues.apache.org/jira/browse/PDFBOX-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630527#comment-14630527
]
John Hewson commented on PDFBOX-2882:
-------------------------------------
{quote}
I chose the usage of AtomicBoolean and the synchronization deliberately to
minimize synchronization overhead and to minimize blocking of parallel access
{quote}
But you're synchronising on every single successful read or write, so you're
already paying the cost where it hurts most. Why add significant complexity to
avoid one single lock in the case where the file is already closed() and an
exception is thrown? Given that exceptions are very expensive, do you really
need to save a microsecond prior to throwing one, given that it comes at such a
large code-complexity cost?
{code}
The nice thing with AtomicBoolean is that it does not use 'normal'
synchronization and comes with lower overhead. Thus where possible I use this,
e.g. for 'isClosed'-checking so this is more lightweight as synchronizing the
whole method.
{code}
Not if you immediately follow it with a synchronized block, as always happens
in ScratchFile, because every successful read (i.e. all of them, in valid code)
pays the full cost of synchronisation. Reading from a closed file is almost
never going to happen in valid code, yet you're trying to optimize for that
case and introducing a lot of complexity. AtomicBoolean is cheap but it's not
free, so you're actually adding overhead for every call that doesn't result in
an exception.
{quote}
Additionally if the page is in-memory the read/write operation does not need
any further synchronization which means it can run in parallel providing very
fast parallel page access to in-memory pages. Only the file access needs to be
synchronized. In principle the file I/O is the operation which will take most
of the time thus it is crucial that such an operation does not block any other
parallel page access not using the file.
{quote}
That's certainly true, you wouldn't want a synchronised method doing expensive
computation prior to accessing the resource which actually requires the
synchronization. However, in ScratchFile it's simply a few boolean and
arithmetic comparisons, which are very fast, so I doubt that the difference in
ScratchFile would even be measurable.
> Improve performance when using scratch file
> -------------------------------------------
>
> Key: PDFBOX-2882
> URL: https://issues.apache.org/jira/browse/PDFBOX-2882
> Project: PDFBox
> Issue Type: Improvement
> Components: Parsing
> Affects Versions: 2.0.0
> Reporter: Timo Boehme
> Assignee: Timo Boehme
> Priority: Minor
> Attachments: ScratchFile.java, ScratchFileBuffer.java
>
>
> The current scratch file implementation uses many direct I/O calls which
> slows down parsing compared with in-memory scratch buffer considerably.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]