[ https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796097#comment-17796097 ]
Axel Howind commented on PDFBOX-5660: ------------------------------------- I made some small changes to improve readability (but this always depends on the eye of the beholder, up to you to decide): * use switch instead of if ... else or if ... || ... || ... for better readability * make some private methods static -> added patch "use_switch_for_readability.patch" I also made some more relevant changes to BaseParser.java and COSParser.java: * isEndOfName() accepts the (integer) value -1, thus explicit checking for -1 before ccalling the method can be omitted. For this to work, the argument must not be cast to char. Otherwise, if you think the code is clearer when casting to char, isEndOfName should be changed to expect a char instead of an int argument. * I also changed isEndOfName() to use a switch as I think this is more readable. * In parseCOSName(), I changed the while-condition and removed the explicit break as I think the code is much clearer like that. * Also in parseCOSName, I have changed the logic to test for UTF-8 and then decode the buffer to instead decode the buffer and in case of an exception, use the alternative charset. Prior to the change, the buffer data was decoded twice, the first decoding was only used to check if it succeeds and decide what charset to use. * The alternative charset used prior to this change was always "Windows-1252". I have changed that to fallback to use "Windows-1252" if available and otherwise fall back to StandardCharsets.ISO_8859_1. Both charsets are nearly the same, except some special characters (see the Wikipedia [article|https://en.wikipedia.org/wiki/Windows-1252] for details). I did this because StandardCharsets.ISO_8859_1 is guaranteed to be available on all machines while "Windows-1252" might not. I tried the PDF linked in PDFBOX-3347, but to my surprise did not see the alternative encoding being used. -> added patch "refactor_isEndOfName.patch" [~tilman] I do all my changes on trunk (Java 11), so I cannot guarantee everything can be applied to 3.0 / 2.0. (Just saw that you had to correct the string conversion patch for 3.0). > Improve code quality (5) > ------------------------ > > Key: PDFBOX-5660 > URL: https://issues.apache.org/jira/browse/PDFBOX-5660 > Project: PDFBox > Issue Type: Improvement > Reporter: Tilman Hausherr > Priority: Minor > Attachments: AnnotationSample.Standard.pdf, > DRY_refactoring_Typ2CharStringParser.patch, > Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch, > > Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch, > Simplify_string_conversion_in_PDFHighlighter.patch, > avoid_multiple_unboxing.patch, code_cleanup.patch, > do_not_create_temporary_File_instance.patch, > extract_common_code,_move_toUpperCase()_out_of_loop.patch, > fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, > make_inner_class_static.patch, refactor_isEndOfName.patch, > remove_code_duplication_in_Type2CharStringParser.patch, > remove_obsolete_class_NullOutputStream.patch, > remove_unnecessary_calls_to_toString()_String_valueOf().patch, > replace_System_getProperty()_calls.patch, screenshot-1.png, > simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch, > simplify_stream_operations.patch, use_Map_ofEntries().patch, > use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, > use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch, > use_String_join().patch, use_switch_for_readability.patch, > use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch > > > This is a longterm issue for the task to improve code quality, by using the > SonarQube report, hints in different IDEs, the FindBugs tool and other code > quality tools. > This is a follow-up of PDFBOX-4892, which was getting too long. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org