[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796097#comment-17796097
 ] 

Axel Howind commented on PDFBOX-5660:
-------------------------------------

I made some small changes to improve readability (but this always depends on 
the eye of the beholder, up to you to decide):
 * use switch instead of if ... else or if ... || ... || ... for better 
readability
 * make some private methods static

-> added patch "use_switch_for_readability.patch"



I also made some more relevant changes to BaseParser.java and COSParser.java:
 * isEndOfName() accepts the (integer) value -1, thus explicit checking for -1 
before ccalling the method can be omitted. For this to work, the argument must 
not be cast to char. Otherwise, if you think the code is clearer when casting 
to char, isEndOfName should be changed to expect a char instead of an int 
argument.
 * I also changed isEndOfName() to use a switch as I think this is more 
readable.
 * In parseCOSName(), I changed the while-condition and removed the explicit 
break as I think the code is much clearer like that.
 * Also in parseCOSName, I have changed the logic to test for UTF-8 and then 
decode the buffer to instead decode the buffer and in case of an exception, use 
the alternative charset. Prior to the change, the buffer data was decoded 
twice, the first decoding was only used to check if it succeeds and decide what 
charset to use.
 * The alternative charset used prior to this change was always "Windows-1252". 
I have changed that to fallback to use "Windows-1252" if available and 
otherwise fall back to StandardCharsets.ISO_8859_1. Both charsets are nearly 
the same, except some special characters (see the Wikipedia 
[article|https://en.wikipedia.org/wiki/Windows-1252] for details). I did this 
because StandardCharsets.ISO_8859_1 is guaranteed to be available on all 
machines while "Windows-1252" might not. I tried the PDF linked in PDFBOX-3347, 
but to my surprise did not see the alternative encoding being used.

-> added patch "refactor_isEndOfName.patch"

[~tilman] I do all my changes on trunk (Java 11), so I cannot guarantee 
everything can be applied to 3.0 / 2.0. (Just saw that you had to correct the 
string conversion patch for 3.0).

> Improve code quality (5)
> ------------------------
>
>                 Key: PDFBOX-5660
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5660
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Tilman Hausherr
>            Priority: Minor
>         Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to