[ 
https://issues.apache.org/jira/browse/PDFBOX-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632581#comment-14632581
 ] 

John Hewson commented on PDFBOX-2882:
-------------------------------------

That's some great improvements.

{quote}
freePages.cardinality: looking at the implementation of BitSet.cardinality you 
will see that it is calculated thus I will keep freePageCount for fast access
{quote}

Long.bitCount is built-into the JVM because it's fast. Counting 1's is a 
blazingly fast operation - very similar to addition under the hood. The 
performance change for ScratchFile by doing this will be indistinguishable from 
zero. Remember, pageCount is volatile, so it's behind a memory barrier, and it 
requires an extra register / cache entry, so it's not free either. You're 
getting into nanosecond performance "improvements" due to pure speculation - 
unless you benchmark this stuff (which is hard indeed) you're almost certainly 
going to draw the wrong conclusions. Far better to just keep the code simple, 
which increases its chance of being correct.

{quote}
close logic: since we have not only pages stored in RandomAccessFile but also 
in-memory pages we need to track isClosed by ourself
{quote}

Why can't we read from a "closed" page? It's not like a file where that's an 
impossible operation. Look at Sun's in-memory streams, ByteArrayOutputStream 
doesn't throw an error for read-after close, because there's no need. I'm not 
saying that it's a legitimate use of the API, but it's not something we need to 
guard against, especially as the API is not public. There's only one call to 
close() in the whole of PDFBox so it's not difficult to validate that we're 
calling the API correctly already. The failure mode is that the caller will get 
the data which they expected... why do we need an exception to prevent that?

> Improve performance when using scratch file
> -------------------------------------------
>
>                 Key: PDFBOX-2882
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2882
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing
>    Affects Versions: 2.0.0
>            Reporter: Timo Boehme
>            Assignee: Timo Boehme
>            Priority: Minor
>         Attachments: ScratchFile.java, ScratchFileBuffer.java
>
>
> The current scratch file implementation uses many direct I/O calls which 
> slows down parsing compared with in-memory scratch buffer considerably.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to