Markus Jelsma created TIKA-975: ---------------------------------- Summary: LinkBuilder to optionally collapse anchor whitespace Key: TIKA-975 URL: https://issues.apache.org/jira/browse/TIKA-975 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.2 Reporter: Markus Jelsma Priority: Minor Fix For: 1.3
Links extracted by the LinkContentHandler contain the verbatim anchor text. This is usually fine but unfortunately many websites have the anchor text spread over multiple lines or have it indented with tabulators or spaces. This patch adds a boolean option to LinkContentHandler with which whitespace collapsing can be toggled on or off. Default behaviour remains as-is and the API remains backward compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira