[ 
https://issues.apache.org/jira/browse/TIKA-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giuseppe Totaro updated TIKA-1184:
----------------------------------

    Attachment: ansi.sys

Hello,

I've just run the tika-app-1.4.jar against files extracted from a disk image, 
and Tika hangs on a .sys file (attached).
I tried tika-app-1.6-SNAPSHOT.jar and it worked fine.

> Infinite halt on parsing old files (e.g. mp3, ms-dos drivers, ...)
> ------------------------------------------------------------------
>
>                 Key: TIKA-1184
>                 URL: https://issues.apache.org/jira/browse/TIKA-1184
>             Project: Tika
>          Issue Type: Bug
>          Components: cli, parser
>    Affects Versions: 1.4
>         Environment: SUSE Linux Enterprise Server 11 SP3  (x86_64)
> java version "1.7.0"
> Java(TM) SE Runtime Environment (build pxa6470sr4fp2-20130426_01(SR4 FP2))
> IBM J9 VM (build 2.6, JRE 1.7.0 Linux amd64-64 Compressed References 
> 20130422_146026 (JIT enabled, AOT enabled)
> J9VM - R26_Java726_SR4_FP2_20130422_1320_B146026
> JIT  - r11.b03_20130131_32403ifx4
> GC   - R26_Java726_SR4_FP2_20130422_1320_B146026_CMPRSS
> J9CL - 20130422_146026)
> JCL - 20130425_01 based on Oracle 7u21-b09
>            Reporter: Jürgen Enge
>         Attachments: ansi.sys, ansi.sys, ansi.sys
>
>
> tika hangs on identifying several types of files. the following example is an 
> mp3 file with corrupt metadata. other filetypes which have the same problem 
> are for example MSDOS device drivers (*.sys)
> i am not into java programming, but my guess would be, that tika is trying to 
> seek() within a file and the target position is greater than filesize. 
> > java -jar tika-app-1.4.jar -m /u01/fk/xd/2/c/16866bc96e6a316d8cbdbd7ca2ce1e
> [hangs forever without error message]
> ffmpeg gives some warnings about duration errors...
> > ffmpeg -i /u01/fk/xd/2/c/16866bc96e6a316d8cbdbd7ca2ce1e
> [mp3 @ 0x633240] max_analyze_duration 5000000 reached at 5015510
> [mp3 @ 0x633240] Estimating duration from bitrate, this may be inaccurate
> Input #0, mp3, from '/u01/fk/xd/2/c/16866bc96e6a316d8cbdbd7ca2ce1e':
>   Metadata:
>     artist          : 
>     album           : 
>   Duration: 00:15:29.10, start: 0.000000, bitrate: 192 kb/s
>     Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16, 192 kb/s



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to