[ https://issues.apache.org/jira/browse/TIKA-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500554#comment-17500554 ]
Naama Hophstatder commented on TIKA-3684: ----------------------------------------- Thanks for your efforts, I took your xml config file and it works. Just to mention, the correct way for me to configure tika-server in docker (version 2.1.0) is by this command: docker run -d -p 9999:9998 -v `pwd`/tika-config.xml:/tika-config.xml apache/tika:2.1.0 --confi g /tika-config.xml (taken from [tika docker repo|https://github.com/apache/tika-docker] > Extract text returns the text multiple times > -------------------------------------------- > > Key: TIKA-3684 > URL: https://issues.apache.org/jira/browse/TIKA-3684 > Project: Tika > Issue Type: Bug > Components: docker > Affects Versions: 2.1.0 > Reporter: Naama Hophstatder > Priority: Major > Attachments: example.docx, example.json, tika-config-no-xmf.xml > > > We are using tika docker container as a linux service, when I want to extract > text from a word document, e.g.: > curl -T example.docx http://localhost:9998/tika --header "Accept: text/plain" > we get the text 3 times. > Notice: We also have tika server v1.14, and this version returns the text > just as expected. -- This message was sent by Atlassian Jira (v8.20.1#820001)