[ 
https://issues.apache.org/jira/browse/TIKA-448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved TIKA-448.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 1.0
         Assignee: Jukka Zitting

Fixed in revision 1179964.
                
> Tika FLVParser hangs
> --------------------
>
>                 Key: TIKA-448
>                 URL: https://issues.apache.org/jira/browse/TIKA-448
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.7
>         Environment: Linux JDK 1.6u13, Nutch 1.1
>            Reporter: Jeroen van Vianen
>            Assignee: Jukka Zitting
>             Fix For: 1.0
>
>         Attachments: FLVParser.patch
>
>
> I am crawling a site with Nutch and creating an index using SOLR.
> After happy crawling for a couple of hours, my Nutch Parse phase hangs. A 
> thread dump shows:
> "Thread-12" prio=10 tid=0xb4974000 nid=0x1b1b runnable [0xb4a50000]
>    java.lang.Thread.State: RUNNABLE
>         at java.io.FilterInputStream.skip(FilterInputStream.java:125)
>         at org.apache.tika.parser.video.FLVParser.parse(FLVParser.java:246)
>         at org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:95)
>         at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:82)
>         at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:85)
>         at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:41)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> The only reason I see why the code might be stuck there is when skip(datalen 
> - skiplen) returns 0 for whatever reason in 
> org.apache.tika.parser.video.FLVParser.parse around line 246:
>                 // Tag was not metadata, skip over data we cannot handle
>                 for (int skiplen = 0; skiplen < datalen;) {
>                     long currentSkipLen = datainput.skip(datalen - skiplen);
>                     skiplen += currentSkipLen;
>                 }
> As I don't know which FLV was downloaded that caused the problem I cannot 
> easily create a testcase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to