Automatically let all valid XHTML 1.0 attributes through from HTML documents
----------------------------------------------------------------------------
Key: TIKA-430
URL: https://issues.apache.org/jira/browse/TIKA-430
Project: Tika
Issue Type: Improvement
Components: parser
Reporter: Ken Krugler
Assignee: Ken Krugler
Many consumers of parse output wouldn't want to process the raw (unnormalized)
elements they'd get with the IdentityHtmlMapper, but they would want to get any
standard attributes. For example, with <a> elements they would get any rel
attribues.
I believe this would require changing the DefaultHtmlMapper to "know" about
valid attributes for different elements.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.