[ 
https://issues.apache.org/jira/browse/TIKA-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15322710#comment-15322710
 ] 

Jukka Zitting commented on TIKA-2001:
-------------------------------------

By default Tika only extracts the text between XML tags, not things like 
attribute values. Since all the content in this XML file is in the attributes, 
nothing gets extracted.

What kind of output would make sense in this case?

Perhaps something like this:

{noformat}
0 0 2016-06-03 06:21:34 2016-06-03 06:21:37 0.002
  0 0 0 0 0 0 0 0 2016-06-03 06:21:37 no
{noformat}

or like this:

{noformat}
spocosy
  subscription-update subscriptionid 0 requestid 0 last_push 2016-06-03 
06:21:34 current_push 2016-06-03 06:21:37 exec 0.002
    lineup id 0 event_participantsFK 0 participantFK 0 lineup_typeFK 0 
shirt_number 0 pos 0 enet_pos 0 n 0 ut 2016-06-03 06:21:37 del no
{noformat}


> Parsing XML outputs empty string
> --------------------------------
>
>                 Key: TIKA-2001
>                 URL: https://issues.apache.org/jira/browse/TIKA-2001
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.11, 1.12, 1.13
>            Reporter: George L. Yermulnik
>            Priority: Minor
>
> Can't get Tika parse my xml files:
> {code}
> root@spring:/tmp# java -version
> java version "1.8.0_91"
> Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
> root@spring:/tmp# cat /tmp/xml/5751061032fbd-7148.xml
> <?xml version="1.0" encoding="UTF-8"?>
> <spocosy version="1.0"><subscription-update subscriptionid="0" requestid="0" 
> last_push="2016-06-03 06:21:34" current_push="2016-06-03 06:21:37" 
> exec="0.002"><lineup id="0" event_participantsFK="0" participantFK="0" 
> lineup_typeFK="0" shirt_number="0" pos="0" enet_pos="0" n="0" ut="2016-06-03 
> 06:21:37" del="no"/></subscription-update></spocosy>
> root@spring:/tmp# for i in 3 2 1; do
>     echo -n "tika-app-1.1${i}.jar: "
>     java -jar tika-app-1.1${i}.jar --text /tmp/xml/5751061032fbd-7148.xml
> done
> tika-app-1.13.jar:
> tika-app-1.12.jar:
> tika-app-1.11.jar:
> root@spring:/tmp#
> {code}
> Appreciate any help. Thanx.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to