[ 
https://issues.apache.org/jira/browse/TIKA-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721634#comment-16721634
 ] 

Hudson commented on TIKA-2791:
------------------------------

UNSTABLE: Integrated in Jenkins build tika-2.x-windows #363 (See 
[https://builds.apache.org/job/tika-2.x-windows/363/])
TIKA-2791 -- add tags/structure to tika-eval (tallison: rev 
1ac6a3bd8601dc3376ce01786f115b877b9d338f)
* (add) tika-eval/src/test/resources/test-dirs/extractsA/file16_badTags.json
* (edit) 
tika-eval/src/main/java/org/apache/tika/eval/batch/ExtractComparerBuilder.java
* (edit) tika-eval/src/test/java/org/apache/tika/eval/SimpleComparerTest.java
* (add) tika-eval/src/main/java/org/apache/tika/eval/util/ContentTagParser.java
* (add) tika-eval/src/main/java/org/apache/tika/eval/util/ContentTags.java
* (edit) tika-eval/src/main/java/org/apache/tika/eval/io/ExtractReader.java
* (edit) 
tika-core/src/main/java/org/apache/tika/sax/AbstractRecursiveParserWrapperHandler.java
* (add) tika-eval/src/test/resources/test-dirs/extractsB/file15_tags.html
* (edit) tika-eval/src/main/java/org/apache/tika/eval/ExtractProfiler.java
* (edit) tika-eval/src/main/java/org/apache/tika/eval/AbstractProfiler.java
* (add) tika-eval/src/test/resources/test-dirs/extractsB/file16_badTags.html
* (edit) tika-eval/pom.xml
* (edit) 
tika-eval/src/main/java/org/apache/tika/eval/batch/ExtractProfilerBuilder.java
* (edit) 
tika-core/src/main/java/org/apache/tika/sax/RecursiveParserWrapperHandler.java
* (edit) tika-eval/src/main/java/org/apache/tika/eval/ExtractComparer.java
* (add) tika-eval/src/test/resources/test-dirs/extractsA/file15_tags.json
* (edit) tika-eval/src/main/java/org/apache/tika/eval/db/Cols.java
* (add) 
tika-eval/src/test/resources/test-dirs/extractsA/file17_tagsOutOfOrder.json


> Add structure tags to tika-eval
> -------------------------------
>
>                 Key: TIKA-2791
>                 URL: https://issues.apache.org/jira/browse/TIKA-2791
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Major
>
> It would be useful to be able to compare counts of common structure tags in 
> tika-eval.  We could also detect and flag bad structure tags that we may be 
> generating, e.g.: <i><u></i></u>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to