[
https://issues.apache.org/jira/browse/TIKA-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707280#comment-16707280
]
Tim Allison commented on TIKA-2791:
-----------------------------------
I'd want to focus on a handful of common tags: p, div, ul, ol, li, table, tr,
td, u, i, b, a...any others?
> Add structure tags to tika-eval
> -------------------------------
>
> Key: TIKA-2791
> URL: https://issues.apache.org/jira/browse/TIKA-2791
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Major
>
> It would be useful to be able to compare counts of common structure tags in
> tika-eval. We could also detect and flag bad structure tags, e.g.:
> <i><u></i></u>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)