[
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848357#comment-17848357
]
Jonathan Prates edited comment on PDFBOX-5823 at 5/21/24 7:25 PM:
------------------------------------------------------------------
I've attached a profiler screenshot and seems like predicate (even static and
creating only once) is not a good option. Do you think you can compare in your
side as well? Please, if you don't mind, have a look at Main-1.java and
Screenshot 2024-05-21 at 20.21.43.png. Perhaps I'm missing something.
was (Author: JIRAUSER305510):
I've attached a profiler screenshot and seems like predicate (even static and
creating only once) is not a good option. Do you think you can compare in your
side as well?
> StringUtil.PATTERN_SPACE memory optmisation
> -------------------------------------------
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
> Issue Type: Improvement
> Components: PDModel
> Affects Versions: 3.0.3 PDFBox
> Reporter: Jonathan Prates
> Assignee: Andreas Lehmkühler
> Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main-1.java, Main.java, Screenshot 2024-05-19 at
> 22.39.10.png, Screenshot 2024-05-19 at 22.40.17.png, Screenshot 2024-05-21 at
> 20.21.43.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a
> word has a space in it
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular
> word), it causes a memory overhead (see attached), due to the several extra
> allocations. I've replaced the regexp for space and \t using word.contains,
> and since it's a O ( 1 ) operation that does not require extra allocations,
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to
> allocate less memory.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]