Improve TextPosition.isDiacritic and ICU4JImpl normalizeDiac performance
------------------------------------------------------------------------
Key: PDFBOX-1080
URL: https://issues.apache.org/jira/browse/PDFBOX-1080
Project: PDFBox
Issue Type: Improvement
Components: Text extraction
Affects Versions: 1.6.0
Reporter: Lars Torunski
Priority: Minor
Fix For: 1.7.0
Character.getType with cText.charAt(0) and index range checks are invoked
unnecessarily three times instead of only one time.
Current 1.6.0 implementation:
public boolean isDiacritic()
{
String cText = this.getCharacter();
return (cText.length() == 1 && (Character.getType(cText.charAt(0)) ==
Character.NON_SPACING_MARK
|| Character.getType(cText.charAt(0)) ==
Character.MODIFIER_SYMBOL
|| Character.getType(cText.charAt(0)) ==
Character.MODIFIER_LETTER));
}
Please use something like this:
public boolean isDiacritic()
{
final String cText = this.getCharacter();
if (cText.length() != 1) return false;
final int type = Character.getType(cText.charAt(0));
return (type == Character.NON_SPACING_MARK
|| type == Character.MODIFIER_SYMBOL
|| type == Character.MODIFIER_LETTER);
}
Check the ICU4JImpl.normalizeDiac method also
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira