[
https://issues.apache.org/jira/browse/TIKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Barrett resolved TIKA-1464.
-------------------------------
Resolution: Not a Problem
The file leak tool was a big help. This identified an area in the code which
was not closing a TikaInputStream. Resolved this and no more open tmp files.
Just a nagging question remains which is why Tika versions prior to 1.6 did not
have this problem? Were previous versions cleaning up open file handles in the
background?
> Too many open files in system when parsing thousands of files
> -------------------------------------------------------------
>
> Key: TIKA-1464
> URL: https://issues.apache.org/jira/browse/TIKA-1464
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.6
> Environment: Os-X 10.10, Windows 8.1 (probably all op systems)
> Reporter: Tim Barrett
> Priority: Blocker
> Labels: TooManyOpenFilesInSystem
>
> Our big data project parses many thousands of different kinds of files
> sequentially. Up to and including Tika 1.5 this has been trouble free and
> Tika has been a pleasure to use. The files parsed are PDF, MSOffice and MSG
> files in roughly equal measure.
> We switched to Tika 1.6 last week and this was a good enhancement for us as a
> number of files (MSOffice) that previously failed to parse do now parse
> correctly under Tika 1.6.
> However we have seen that a Too many open files in system exception is raised
> somewhere above 10000 files having been parsed. On a windows server this
> exception is not raised but the system eventually begins to crawl.
> Watching the system's behaviour with the apache tmp files we see that the
> apache tika files *are* being deleted from the file system, but lsof is
> showing all these files as remaining open by the running process using Tika.
> It would appear that the files are being deleted but handles to these files
> are not being cleared.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)