[ 
https://issues.apache.org/jira/browse/TIKA-448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883858#action_12883858
 ] 

Jukka Zitting commented on TIKA-448:
------------------------------------

The InputStream.skip() method can always return 0 if it wants, see IO-203 for 
related discussion.

It might be easiest to simply always read() the tag content into memory instead 
of trying to skip() it. The performance and memory overhead shouldn't be too 
high.

> Tika FLVParser hangs
> --------------------
>
>                 Key: TIKA-448
>                 URL: https://issues.apache.org/jira/browse/TIKA-448
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.7
>         Environment: Linux JDK 1.6u13, Nutch 1.1
>            Reporter: Jeroen van Vianen
>         Attachments: FLVParser.patch
>
>
> I am crawling a site with Nutch and creating an index using SOLR.
> After happy crawling for a couple of hours, my Nutch Parse phase hangs. A 
> thread dump shows:
> "Thread-12" prio=10 tid=0xb4974000 nid=0x1b1b runnable [0xb4a50000]
>    java.lang.Thread.State: RUNNABLE
>         at java.io.FilterInputStream.skip(FilterInputStream.java:125)
>         at org.apache.tika.parser.video.FLVParser.parse(FLVParser.java:246)
>         at org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:95)
>         at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:82)
>         at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:85)
>         at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:41)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> The only reason I see why the code might be stuck there is when skip(datalen 
> - skiplen) returns 0 for whatever reason in 
> org.apache.tika.parser.video.FLVParser.parse around line 246:
>                 // Tag was not metadata, skip over data we cannot handle
>                 for (int skiplen = 0; skiplen < datalen;) {
>                     long currentSkipLen = datainput.skip(datalen - skiplen);
>                     skiplen += currentSkipLen;
>                 }
> As I don't know which FLV was downloaded that caused the problem I cannot 
> easily create a testcase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to