[ 
https://issues.apache.org/jira/browse/TIKA-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18085362#comment-18085362
 ] 

Tim Allison commented on TIKA-4730:
-----------------------------------

I turned on the strict validator on the 4.x run. This throws exceptions for 
unbalanced xhtml. All of the "new" exceptions are just identifying where there 
was some kind of parse exception before or silently bad xhtml.

This includes the new charset detector, which is generally doing a lot better, 
but does have some problems. I'll try to quantify that.

pack200 is still a problem.

This was run against 3.3.1, not against 4.0.0-alpha-1

 

> Prep for 4.0.0-beta-1 release
> -----------------------------
>
>                 Key: TIKA-4730
>                 URL: https://issues.apache.org/jira/browse/TIKA-4730
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>         Attachments: reports.tar.gz
>
>
> We made a number of important fixes to the published artifacts in ASF's dist 
> repo, maven central and docker.
> I think we're set on changing APIs for 4.x generally.
> Is there anything else we need for this beta release?
> I propose starting the 4.0.0-beta-1 release in two weeks. WDYT?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to