[
https://issues.apache.org/jira/browse/PDFBOX-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702234#comment-17702234
]
Tilman Hausherr commented on PDFBOX-5575:
-----------------------------------------
This is the LZW encoder which is used only in tests to test the decoder,
because this is an outdated encoding (FLATE is better). While your code looks
better and has some optimizations, I wonder why the loop in {{findPatternCode}}
goes up. In the old code it goes down so that longer segments are tested first.
(I tested the non deterministic part of the tests by replacing {{COUNT * 2}}
with {{COUNT}} in {{TestFilters}} and didn't find any length differences, so my
reasoning might be wrong)
> optimize LZWFilter
> ------------------
>
> Key: PDFBOX-5575
> URL: https://issues.apache.org/jira/browse/PDFBOX-5575
> Project: PDFBox
> Issue Type: Improvement
> Affects Versions: 3.0.0 PDFBox
> Reporter: Axel Howind
> Priority: Minor
> Attachments: optimize_LZWFilter.patch
>
>
> I ran the PDFBox tests with a profiler and saw that LZWFilter used quite a
> bunch of time, so I thought I might look at the code. I just looked at it
> totally out of context and tried to understand what is done there and what
> could be changed without altering results.
> * made the private mehtods static
> * changed the variable/method parameter 'earlyChange' from integer to
> boolean because I thought tha would be more readable
> * some minor tweaks
> * it looks like codeTable is initialized quite often and everytime, 256
> length 1 byte arrays are created, so I pre-allocate those byte arrays so that
> they can be shared by all code tables. [~tilman] I assumed the contents of
> the codeTable entries will not be changed, and my analysis of the code seems
> to prove that (also the passing unit tests). Just please have a look at this
> so I don't break anything.
> * it took me some time to fully understand what findPatternCode() does and
> why it checks the codeTable in reverse order. I more or less recreated that
> method from scratch and I think it should now always be faster: for patterns
> of length 1 no iteration is done, and for longer patterns iteration stops
> once the correct entry is found. As this is the most notable change, please
> take a closer look. Unit tests pass.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]