Improve TextPosition.isDiacritic and ICU4JImpl normalizeDiac performance
------------------------------------------------------------------------

                 Key: PDFBOX-1080
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1080
             Project: PDFBox
          Issue Type: Improvement
          Components: Text extraction
    Affects Versions: 1.6.0
            Reporter: Lars Torunski
            Priority: Minor
             Fix For: 1.7.0


Character.getType with cText.charAt(0) and index range checks are invoked 
unnecessarily three times instead of only one time.

Current 1.6.0 implementation:

    public boolean isDiacritic()
    {
        String cText = this.getCharacter();
        return (cText.length() == 1 &&  (Character.getType(cText.charAt(0)) == 
Character.NON_SPACING_MARK
                || Character.getType(cText.charAt(0)) == 
Character.MODIFIER_SYMBOL
                || Character.getType(cText.charAt(0)) == 
Character.MODIFIER_LETTER));
    }

Please use something like this:

    public boolean isDiacritic()
    {
        final String cText = this.getCharacter();
        if (cText.length() != 1) return false;
        final int type = Character.getType(cText.charAt(0));
        return (type == Character.NON_SPACING_MARK
                || type == Character.MODIFIER_SYMBOL
                || type == Character.MODIFIER_LETTER);
    }


Check the ICU4JImpl.normalizeDiac method also

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to