Walter created TIKA-2910:
----------------------------
Summary: Text extraction using Tika command line and Tika server
differs
Key: TIKA-2910
URL: https://issues.apache.org/jira/browse/TIKA-2910
Project: Tika
Issue Type: Bug
Affects Versions: 1.21
Reporter: Walter
When extracting TXT from the very same XML file using either Tika command line
utility or the Tika in server mode, the results differ.
It looks as if PCDATA in deeper nested XML structures are just ignored and only
an empty line is returned.
I assume both use the same base code. Are there any default settings that may
differ or can be set?
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)