Hi Team, We are using "tika-app-2.6.0.jar" in our product. We have a CVE alert on using "NEKOHTML".
We also found that "tika-app-2.6.0.jar" uses "NEKOHTML", But we are unable to find that which version of "NEKOHTML" is being used by tika. By extracting named files, we see below files that are from nekohtml present in tika app: >>> org/cyberneko/html/HTMLElements$Element.class org/cyberneko/html/HTMLElements$ElementList.class org/cyberneko/html/HTMLElements.class org/cyberneko/html/HTMLTagBalancer$ElementEntry.class org/cyberneko/html/HTMLTagBalancer$Info.class org/cyberneko/html/HTMLTagBalancer$InfoStack.class org/cyberneko/html/HTMLTagBalancer.class <<< And also there are few entries regarding nekohtml upgrade in CHANGES.txt: >>> 22. TIKA-144 - Upgrade nekohtml dependency (Jukka Zitting) 40. TIKA-164 - Upgrade of the nekohtml dependency to 1.9.9 (Jukka Zitting) <<< Summarizing my questions: 1. What version of "NEKOHTML" is tika using currently? 2. How is "NEKOHTML" included in tika-app, since there is no entry of it in pom.xml? 3. Is there a way that I can list all the dependencies of tika (including 4th party libraries) ? Running mvn dependency:tree OR mvn dependency:list on source does not help in finding the version of "NEKOHTML"? Any help will be greatly appreciated. Thanks, Hanumesh
