[ 
https://issues.apache.org/jira/browse/CONNECTORS-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294803#comment-15294803
 ] 

Mr.Keuz commented on CONNECTORS-1317:
-------------------------------------

I built trunk version.
It seems problem still happens.

I checkout trunk from: 
https://svn.apache.org/repos/asf/manifoldcf/trunk/
Revision: (r1744253 at 2016-05-17 14:07:44 +0300)

1. ant make-core-deps
2. ant make-deps
3. ant build
4. Run ./dist/example/start.sh
5. Create same job and run

Stacktrace are next:

FATAL 2016-05-21 09:38:22,355 (Worker thread '13') - Error tossed: 
com/rometools/utils/Lists
java.lang.NoClassDefFoundError: com/rometools/utils/Lists
        at 
com.rometools.rome.io.impl.Atom10Parser.parseAlternateLinks(Atom10Parser.java:276)
        at 
com.rometools.rome.io.impl.Atom10Parser.parseFeedMetadata(Atom10Parser.java:148)
        at 
com.rometools.rome.io.impl.Atom10Parser.parseFeed(Atom10Parser.java:113)
        at com.rometools.rome.io.impl.Atom10Parser.parse(Atom10Parser.java:95)
        at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:318)
        at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:265)
        at com.rometools.rome.io.SyndFeedInput.build(SyndFeedInput.java:169)
        at org.apache.tika.parser.feed.FeedParser.parse(FeedParser.java:70)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
        at 
org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
        at 
org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102)
        at 
org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.java:219)
        at 
org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:182)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
        at 
org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:48)
        at 
org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:227)
        at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3224)
        at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3075)
        at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2706)
        at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
        at 
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
        at 
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
        at 
org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.processDocuments(FileConnector.java:404)
        at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)







> Hang crawling job on some ZIP documents
> ---------------------------------------
>
>                 Key: CONNECTORS-1317
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1317
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: File system connector
>    Affects Versions: ManifoldCF 2.3
>         Environment: Ubuntu 14.04 Linux 3.13.0-86-generic i686 i686
> java version "1.8.0_31"
> Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
> DB: Postgres 9.5.1
>            Reporter: Mr.Keuz
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 2.5
>
>
> I use ManifolCF as file crawler. But I found, that crawling process hangs on 
> some zip files. Although some files parsing normally. 
> Steps: 
> 1. Run ManfoldCF by  "example/start.sh" and Posgres as DB
> 2. Create manifold pipeline: File -> Tika -> Solr
> 3. Put zip file in folder (in attach below)
> 4. Run job
> Here zip file that should reproduce bug: 
> "ManifoldCF_ISSUE_Dive.Into.Python.3.Mark.Pilgrim.2009.zip"
> https://yadi.sk/d/0uSdrR5GrsgmG 
> Note:
> As I investigated (by strace) - crawler process tries to open and parse same 
> zip file again and again (it seems from different workers threads). And It 
> seems that document not removes from queue.
> I am newbie in ManifoldCF, so it is hard task to me to find problem in source 
> code.
> I can send some additional info if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to