[
https://issues.apache.org/jira/browse/TIKA-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115397#comment-14115397
]
Nick Burch commented on TIKA-1404:
----------------------------------
If you build from source / a svn checkout, you'll find the tika app jar in
tika-app/target and the tika server jar (no war yet, there's a ticket for that)
in tika-server/target
Otherwise, snapshot builds are available to download from the CI server -
https://builds.apache.org/job/tika-trunk-jdk1.7/
> tika-server leaking temporary files when converting Word97 (doc)
> ----------------------------------------------------------------
>
> Key: TIKA-1404
> URL: https://issues.apache.org/jira/browse/TIKA-1404
> Project: Tika
> Issue Type: Bug
> Components: server
> Affects Versions: 1.5
> Environment: Linux (observed on CentOS 6.5 and SuSE SLES 11)
> Reporter: Lukas Graf
> Assignee: Nick Burch
> Attachments: simple_word97.doc
>
>
> When converting Word97 documents (*.doc), tika-server reproducibly leaves
> behind temporary files.
> Steps to reproduce:
> - Start {{tika-app-1.5.jar}} in {{--server}} mode
> - Send a {{*.doc}} file to server for conversion
> - Stop tika-server using CTRL+C or {{kill -15}}
> For example:
> {code}
> lukas@host:~> java -jar tika-app-1.5.jar -v --server --port 8077 --text
> # ...
> lukas@host:/tmp> ls -lah apache-tika-*
> ls: cannot access apache-tika-*: No such file or directory
> lukas@host:/tmp>
> lukas@host:/tmp> netcat 127.0.0.1 8077 < simple_word97.doc
> Simple Word-97 Document
> Lorem Ipsum.
> lukas@host:/tmp> ls -lah apache-tika-*
> -rw-r--r-- 1 lukas users 22K 2014-08-29 15:48
> apache-tika-2457738389388821864.tmp
> # after conversion is done, tmp file handles are still open
> lukas@host:/tmp> lsof | grep tika
> java 29857 lukas 32r REG 104,2 28628386 4571740
> /home/lukas/tika-app-1.5.jar
> java 29857 lukas 85r REG 104,2 22528 8604717
> /tmp/apache-tika-2457738389388821864.tmp
> java 29857 lukas 86r REG 104,2 22528 8604717
> /tmp/apache-tika-2457738389388821864.tmp
> # stop tika-server...
> ^C
> lukas@host:~>
> # ...
> lukas@host:/tmp> lsof | grep tika
> lukas@host:/tmp>
> {code}
> No exceptions are thrown, and the plaintext is being extracted correctly from
> the document, but temporary files are still left behind every single time.
> This obviously is a major issue in a production environment when converting
> thousands of documents a day. Our temp directories are filling up rapidly,
> and we had to configure cron jobs to clean up after Tika on most of our
> production servers. I wasn't able to reproduce this issue using
> {{tika-app-1.5.jar}} in non-server mode. However, booting up a JVM for every
> single conversion is just too slow.
--
This message was sent by Atlassian JIRA
(v6.2#6252)