[ https://issues.apache.org/jira/browse/TIKA-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Giuseppe Totaro updated TIKA-1184: ---------------------------------- Attachment: ansi.sys Hello, I've just run the tika-app-1.4.jar against files extracted from a disk image, and Tika hangs on a .sys file (attached). I tried tika-app-1.6-SNAPSHOT.jar and it worked fine. > Infinite halt on parsing old files (e.g. mp3, ms-dos drivers, ...) > ------------------------------------------------------------------ > > Key: TIKA-1184 > URL: https://issues.apache.org/jira/browse/TIKA-1184 > Project: Tika > Issue Type: Bug > Components: cli, parser > Affects Versions: 1.4 > Environment: SUSE Linux Enterprise Server 11 SP3 (x86_64) > java version "1.7.0" > Java(TM) SE Runtime Environment (build pxa6470sr4fp2-20130426_01(SR4 FP2)) > IBM J9 VM (build 2.6, JRE 1.7.0 Linux amd64-64 Compressed References > 20130422_146026 (JIT enabled, AOT enabled) > J9VM - R26_Java726_SR4_FP2_20130422_1320_B146026 > JIT - r11.b03_20130131_32403ifx4 > GC - R26_Java726_SR4_FP2_20130422_1320_B146026_CMPRSS > J9CL - 20130422_146026) > JCL - 20130425_01 based on Oracle 7u21-b09 > Reporter: Jürgen Enge > Attachments: ansi.sys, ansi.sys, ansi.sys > > > tika hangs on identifying several types of files. the following example is an > mp3 file with corrupt metadata. other filetypes which have the same problem > are for example MSDOS device drivers (*.sys) > i am not into java programming, but my guess would be, that tika is trying to > seek() within a file and the target position is greater than filesize. > > java -jar tika-app-1.4.jar -m /u01/fk/xd/2/c/16866bc96e6a316d8cbdbd7ca2ce1e > [hangs forever without error message] > ffmpeg gives some warnings about duration errors... > > ffmpeg -i /u01/fk/xd/2/c/16866bc96e6a316d8cbdbd7ca2ce1e > [mp3 @ 0x633240] max_analyze_duration 5000000 reached at 5015510 > [mp3 @ 0x633240] Estimating duration from bitrate, this may be inaccurate > Input #0, mp3, from '/u01/fk/xd/2/c/16866bc96e6a316d8cbdbd7ca2ce1e': > Metadata: > artist : > album : > Duration: 00:15:29.10, start: 0.000000, bitrate: 192 kb/s > Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16, 192 kb/s -- This message was sent by Atlassian JIRA (v6.1.5#6160)