[
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796097#comment-17796097
]
Axel Howind commented on PDFBOX-5660:
-------------------------------------
I made some small changes to improve readability (but this always depends on
the eye of the beholder, up to you to decide):
* use switch instead of if ... else or if ... || ... || ... for better
readability
* make some private methods static
-> added patch "use_switch_for_readability.patch"
I also made some more relevant changes to BaseParser.java and COSParser.java:
* isEndOfName() accepts the (integer) value -1, thus explicit checking for -1
before ccalling the method can be omitted. For this to work, the argument must
not be cast to char. Otherwise, if you think the code is clearer when casting
to char, isEndOfName should be changed to expect a char instead of an int
argument.
* I also changed isEndOfName() to use a switch as I think this is more
readable.
* In parseCOSName(), I changed the while-condition and removed the explicit
break as I think the code is much clearer like that.
* Also in parseCOSName, I have changed the logic to test for UTF-8 and then
decode the buffer to instead decode the buffer and in case of an exception, use
the alternative charset. Prior to the change, the buffer data was decoded
twice, the first decoding was only used to check if it succeeds and decide what
charset to use.
* The alternative charset used prior to this change was always "Windows-1252".
I have changed that to fallback to use "Windows-1252" if available and
otherwise fall back to StandardCharsets.ISO_8859_1. Both charsets are nearly
the same, except some special characters (see the Wikipedia
[article|https://en.wikipedia.org/wiki/Windows-1252] for details). I did this
because StandardCharsets.ISO_8859_1 is guaranteed to be available on all
machines while "Windows-1252" might not. I tried the PDF linked in PDFBOX-3347,
but to my surprise did not see the alternative encoding being used.
-> added patch "refactor_isEndOfName.patch"
[~tilman] I do all my changes on trunk (Java 11), so I cannot guarantee
everything can be applied to 3.0 / 2.0. (Just saw that you had to correct the
string conversion patch for 3.0).
> Improve code quality (5)
> ------------------------
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
> Issue Type: Improvement
> Reporter: Tilman Hausherr
> Priority: Minor
> Attachments: AnnotationSample.Standard.pdf,
> DRY_refactoring_Typ2CharStringParser.patch,
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
> Simplify_string_conversion_in_PDFHighlighter.patch,
> avoid_multiple_unboxing.patch, code_cleanup.patch,
> do_not_create_temporary_File_instance.patch,
> extract_common_code,_move_toUpperCase()_out_of_loop.patch,
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch,
> make_inner_class_static.patch, refactor_isEndOfName.patch,
> remove_code_duplication_in_Type2CharStringParser.patch,
> remove_obsolete_class_NullOutputStream.patch,
> remove_unnecessary_calls_to_toString()_String_valueOf().patch,
> replace_System_getProperty()_calls.patch, screenshot-1.png,
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
> simplify_stream_operations.patch, use_Map_ofEntries().patch,
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch,
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
> use_String_join().patch, use_switch_for_readability.patch,
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the
> SonarQube report, hints in different IDEs, the FindBugs tool and other code
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]