Walter created TIKA-2910:
----------------------------

             Summary: Text extraction using Tika command line and Tika server 
differs
                 Key: TIKA-2910
                 URL: https://issues.apache.org/jira/browse/TIKA-2910
             Project: Tika
          Issue Type: Bug
    Affects Versions: 1.21
            Reporter: Walter


When extracting TXT from the very same XML file using either Tika command line 
utility or the Tika in server mode, the results differ.

It looks as if PCDATA in deeper nested XML structures are just ignored and only 
an empty line is returned.

I assume both use the same base code. Are there any default settings that may 
differ or can be set?

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to