[
https://issues.apache.org/jira/browse/PDFBOX-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632581#comment-14632581
]
John Hewson commented on PDFBOX-2882:
-------------------------------------
That's some great improvements.
{quote}
freePages.cardinality: looking at the implementation of BitSet.cardinality you
will see that it is calculated thus I will keep freePageCount for fast access
{quote}
Long.bitCount is built-into the JVM because it's fast. Counting 1's is a
blazingly fast operation - very similar to addition under the hood. The
performance change for ScratchFile by doing this will be indistinguishable from
zero. Remember, pageCount is volatile, so it's behind a memory barrier, and it
requires an extra register / cache entry, so it's not free either. You're
getting into nanosecond performance "improvements" due to pure speculation -
unless you benchmark this stuff (which is hard indeed) you're almost certainly
going to draw the wrong conclusions. Far better to just keep the code simple,
which increases its chance of being correct.
{quote}
close logic: since we have not only pages stored in RandomAccessFile but also
in-memory pages we need to track isClosed by ourself
{quote}
Why can't we read from a "closed" page? It's not like a file where that's an
impossible operation. Look at Sun's in-memory streams, ByteArrayOutputStream
doesn't throw an error for read-after close, because there's no need. I'm not
saying that it's a legitimate use of the API, but it's not something we need to
guard against, especially as the API is not public. There's only one call to
close() in the whole of PDFBox so it's not difficult to validate that we're
calling the API correctly already. The failure mode is that the caller will get
the data which they expected... why do we need an exception to prevent that?
> Improve performance when using scratch file
> -------------------------------------------
>
> Key: PDFBOX-2882
> URL: https://issues.apache.org/jira/browse/PDFBOX-2882
> Project: PDFBox
> Issue Type: Improvement
> Components: Parsing
> Affects Versions: 2.0.0
> Reporter: Timo Boehme
> Assignee: Timo Boehme
> Priority: Minor
> Attachments: ScratchFile.java, ScratchFileBuffer.java
>
>
> The current scratch file implementation uses many direct I/O calls which
> slows down parsing compared with in-memory scratch buffer considerably.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]