[
https://issues.apache.org/jira/browse/TIKA-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693507#comment-17693507
]
ASF GitHub Bot commented on TIKA-3979:
--------------------------------------
nddipiazza commented on PR #985:
URL: https://github.com/apache/tika/pull/985#issuecomment-1445166101
Never mind. i was testing wrong format. we are not concerned only with the
alternative packaging format, and my previous test didn't.
100 documents parsed previously in 10252 ms
now 100 documents parsed previously in 1062 ms
so yeah. big win.
> OneNoteParser - Improve performance for deserialization
> -------------------------------------------------------
>
> Key: TIKA-3979
> URL: https://issues.apache.org/jira/browse/TIKA-3979
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 2.7.0
> Reporter: David Xie
> Priority: Major
> Attachments: image-2023-02-20-14-42-10-590.png
>
>
> We noticed some performance issues specific to parsing OneNote files. Our cpu
> profiler reports that the parser spends a lot of time on deserializing byte
> arrays (image included below)
> !image-2023-02-20-14-42-10-590.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)