[
https://issues.apache.org/jira/browse/TIKA-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341674#comment-16341674
]
Hudson commented on TIKA-2551:
------------------------------
SUCCESS: Integrated in Jenkins build Tika-trunk #1426 (See
[https://builds.apache.org/job/Tika-trunk/1426/])
TIKA-2551: No longer hardcode HtmlParser for XML files in tika-server.
(tallison:
[https://github.com/apache/tika/commit/066e60d5d6de8d51124c297410e7a4eca787d143])
* (edit) CHANGES.txt
* (edit)
tika-server/src/main/java/org/apache/tika/server/resource/TikaResource.java
> TIka Server uses HtmlParser for XML no matter what config is given, even if
> XML is disabled in Config
> -----------------------------------------------------------------------------------------------------
>
> Key: TIKA-2551
> URL: https://issues.apache.org/jira/browse/TIKA-2551
> Project: Tika
> Issue Type: Bug
> Components: server
> Affects Versions: 1.17
> Reporter: Nick Burch
> Priority: Major
> Fix For: 2.0
>
>
> For some reason, the Tika Server has this line in TikaResource.java
> {code}
> parsers.put(MediaType.APPLICATION_XML, new HtmlParser());
> {code}
> The upshot of which is that the Tika Server (only) will always use the
> HtmlParser for XML files, no matter what is configured in the Tika Config. If
> you disable XML in the Tika Config, or assign it to a different parser, this
> will be silently ignored
> To test, run the Tika Server with the {{TIKA-866-valid.xml}} test file from
> {{tika-core/src/test/resources/org/apache/tika/config}} which uses the
> EmptyParser for everything. If you ask the server what parsers it has, it
> correctly reports none at http://localhost:9998/parsers . If you give it an
> XML file, you'd expect it to fall through to the fallback parser (or possibly
> empty parser). Instead, it gets processed as html, which is completely
> unexpected!
> Originally discovered via
> https://stackoverflow.com/questions/48391615/tell-tika-not-to-parse-xml
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)