[ 
https://issues.apache.org/jira/browse/TIKA-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754154#comment-17754154
 ] 

Tim Allison commented on TIKA-4109:
-----------------------------------

I think jsoup (https://jsoup.org/) would be the natural replacement. It will 
take some work to exchange the two parsers and get equivalent results, but it 
should be possible.

> Remove use of EOL component TagSoup 1.2.1 from tika-parsers-standard-package
> ----------------------------------------------------------------------------
>
>                 Key: TIKA-4109
>                 URL: https://issues.apache.org/jira/browse/TIKA-4109
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 2.8.0
>            Reporter: Sandeep Kulkarni
>            Priority: Major
>
> tika-parsers-standard-package has dependency of 
> *org.ccil.cowan.tagsoup:tagsoup:jar:1.2.1.* It is getting detected EOL as 
> there is no new version since 10+ yrs by source code scanners.
> That project is not maintained any more and the source code for it also not 
> available anymore. Homepage is also not reachable: 
> [http://home.ccil.org/~cowan/XML/tagsoup/.|http://home.ccil.org/~cowan/XML/tagsoup/]
> There is a fork created on Github: 
> [https://github.com/zmokhtar/TagSoup-Webs.] But there does not seems to be 
> any further activity there as well.
> Is it possible to remove the use of TagSoup 1.2.1 by using alternates? If I 
> was aware of one, I would have suggested myself.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to