[
https://issues.apache.org/jira/browse/CONNECTORS-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mr.Keuz updated CONNECTORS-1317:
--------------------------------
Summary: Hang crawling on some ZIP documents (was: Hang parsing on some
ZIP document)
> Hang crawling on some ZIP documents
> -----------------------------------
>
> Key: CONNECTORS-1317
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1317
> Project: ManifoldCF
> Issue Type: Bug
> Components: File system connector
> Affects Versions: ManifoldCF 2.3
> Environment: Ubuntu 14.04 Linux 3.13.0-86-generic i686 i686
> java version "1.8.0_31"
> Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
> DB: Postgres 9.5.1
> Reporter: Mr.Keuz
>
> I use ManifolCF as file crawler. But I found, that crawling process hangs on
> some zip files. Although some files parsing normally.
> Steps:
> 1. Run ManfoldCF by "example/start.sh" and Posgres as DB
> 2. Create manifold pipeline: File -> Tika -> Solr
> 3. Put zip file in folder (in attach below)
> 4. Run job
> Here zip file that should reproduce bug:
> "ManifoldCF_ISSUE_Dive.Into.Python.3.Mark.Pilgrim.2009.zip"
> https://yadi.sk/d/0uSdrR5GrsgmG
> Note:
> As I investigated (by strace) - crawler process tries to open and parse same
> zip file again and again (it seems from different workers threads). And It
> seems that document not removes from queue.
> I am newbie in ManifoldCF, so it is hard task to me to find problem in source
> code.
> I can send some additional info if needed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)