[
https://issues.apache.org/jira/browse/TIKA-4728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4728.
-------------------------------
Resolution: Fixed
This fixes quite a bit and bakes balanced tag checking into unit tests.
Users need to know that xhtml may be malformed in exceptions. We're doing what
we can but, ahem, things happen.
When in doubt, use jsoup or similar to handle potentially malformed xhtml
robustly.
> Validate xhtml output, generally
> --------------------------------
>
> Key: TIKA-4728
> URL: https://issues.apache.org/jira/browse/TIKA-4728
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Major
>
> There's a bug in the xml output that we're writing for specific js attached
> in a specific way in PDFs. We should fix that, but we should add more
> general, more robust testing that we can actually parse our xhtml.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)