Please Find below error stack trace:-

ERROR: agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.HashMap.newNode(HashMap.java:1750)
        at java.util.HashMap.putVal(HashMap.java:631)
        at java.util.HashMap.put(HashMap.java:612)
        at
org.apache.manifoldcf.connectorcommon.fuzzyml.HTMLParseState.noteTag(

     HTMLParseState.java:51)
        at
org.apache.manifoldcf.connectorcommon.fuzzyml.TagParseState.dealWithC

     haracter(TagParseState.java:638)
        at
org.apache.manifoldcf.connectorcommon.fuzzyml.SingleCharacterReceiver

     .dealWithCharacters(SingleCharacterReceiver.java:51)
        at
org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.de

     alWithBytes(DecodingByteReceiver.java:48)
        at
org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithoutChar

     setDetection(Parser.java:99)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnect

     or.handleHTML(WebcrawlerConnector.java:4918)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnect

     or.extractLinks(WebcrawlerConnector.java:3852)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnect

     or.processDocuments(WebcrawlerConnector.java:747)
        at
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.ja

     va:399)
agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: GC overhead limit exceeded
agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.nio.ByteBuffer.wrap(ByteBuffer.java:373)
        at java.nio.ByteBuffer.wrap(ByteBuffer.java:396)
        at
org.apache.commons.compress.archivers.zip.ZipFile.resolveLocalFileHea

     derData(ZipFile.java:1059)
        at
org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java

     :296)
        at
org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java

     :218)
        at
org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java

     :201)
        at
org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java

     :162)
        at
org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipCon

     tainerDetector.java:241)
        at
org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipCo

     ntainerDetector.java:173)
        at
org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDe

     tector.java:110)
        at
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.jav

     a:84)
        at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1

     16)
        at
org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:7

     2)
        at
org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbed

     ded(ParsingEmbeddedDocumentExtractor.java:102)
        at
org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.jav

     a:350)
        at
org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:287

     )
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280

     )
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280

     )
        at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1

     43)
        at
org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:7

     2)
        at
org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbed

     ded(ParsingEmbeddedDocumentExtractor.java:102)
        at
org.apache.tika.parser.pkg.CompressorParser.parse(CompressorParser.ja

     va:280)
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280

     )
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280

     )
        at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1

     43)
        at
org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(Tik

     aParser.java:74)
        at
org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrR

     eplaceDocumentWithException(TikaExtractor.java:235)
        at
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$Pi


 
pelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3

           226)
        at
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$Pi

     pelineAddFanout.sendDocument(IncrementalIngester.java:3077)
        at
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$Pi


 
pelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.j

           ava:2708)
        at
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.do

     cumentIngest(IncrementalIngester.java:756)
        at
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ing

     estDocumentWithException(WorkerThread.java:1583)
agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: GC overhead limit exceeded
agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: GC overhead limit exceeded
[Thread-491] INFO org.eclipse.jetty.server.ServerConnector - Stopped
ServerConne
                       ctor@3a4621bd{HTTP/1.1}{0.0.0.0:8345}
agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: GC overhead limit exceeded
agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: GC overhead limit exceeded
agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: GC overhead limit exceeded
[Thread-491] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped
o.e.

j.w.WebAppContext@6a57ae10{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api


 
-service.war-_mcf-api-service-any-3323783172971878700.dir/webapp/,UNAVAILABLE}{/

           usr/share/manifoldcf/example/./../web/war/mcf-api-service.war}
agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: GC overhead limit exceeded
agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: GC overhead limit exceeded
[Thread-491] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped
o.e.

j.w.WebAppContext@51c693d{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mc


 
f-authority-service.war-_mcf-authority-service-any-3706951886687463454.dir/webap


 
p/,UNAVAILABLE}{/usr/share/manifoldcf/example/./../web/war/mcf-authority-service

           .war}

On Fri, Aug 16, 2019 at 3:22 PM Karl Wright <daddy...@gmail.com> wrote:

> Without an out-of-memory stack trace, I cannot definitively point to Tika
> or say that it's a specific kind of file.  Please send one.
>
> Karl
>
>
> On Fri, Aug 16, 2019 at 2:09 AM Priya Arora <pr...@smartshore.nl> wrote:
>
> > *Existing Threads/connections configuration is :-*
> >
> > How many worker threads do you have? - 15 worker threads has been
> > allocated(in properties.xml file).
> > And the Tika Extractor connections -10 connections are defined.
> >
> > Is this suggested to reduce the number more.
> > If not, what else can be a solution
> >
> > Thanks
> > Priya
> >
> >
> >
> > On Wed, Aug 14, 2019 at 5:32 PM Karl Wright <daddy...@gmail.com> wrote:
> >
> > > How many worker threads do you have?
> > > Even if each worker thread is constrained in memory, and they should
> be,
> > > you can easily cause things to run out of memory by giving too many
> > worker
> > > threads.  Another way to keep Tika's usage constrained would be to
> reduce
> > > the number of Tika Extractor connections, because that effectively
> limits
> > > the number of extractions that can be going on at the same time.
> > >
> > > Karl
> > >
> > >
> > > On Wed, Aug 14, 2019 at 7:23 AM Priya Arora <pr...@smartshore.nl>
> wrote:
> > >
> > > > Yes , I am using Tika Extractor. And the version used for manifold is
> > > 2.13.
> > > > Also I am using postgres as database.
> > > >
> > > > I have 4 types of jobs
> > > > One is accessing/re crawling data from a public site. Other three are
> > > > accessing intranet site.
> > > > Out of which two are giving me correct output-without any error and
> > third
> > > > one which is having data more than the other two , and  giving me
> this
> > > > error.
> > > >
> > > > Is there any possibility with site accessibility issue. Can you
> please
> > > > suggest some solution
> > > > Thanks and regards
> > > > Priya
> > > >
> > > > On Wed, Aug 14, 2019 at 3:11 PM Karl Wright <daddy...@gmail.com>
> > wrote:
> > > >
> > > > > I will need to know more.  Do you have the tika extractor in your
> > > > > pipeline?  If so, what version of ManifoldCF are you using?  Tika
> has
> > > had
> > > > > bugs related to memory consumption in the past; the out of memory
> > > > exception
> > > > > may be coming from it and therefore a stack trace is critical to
> > have.
> > > > >
> > > > > Alternatively, you can upgrade to the latest version of MCF (2.13)
> > and
> > > > that
> > > > > has a newer version of Tika without those problem.  But you may
> need
> > to
> > > > get
> > > > > the agents process more memory.
> > > > >
> > > > > Another possible cause is that you're using hsqldb in production.
> > > HSQLDB
> > > > > keeps all of its tables in memory.  If you have a large crawl, you
> do
> > > not
> > > > > want to use HSQLDB.
> > > > >
> > > > > Thanks,
> > > > > Karl
> > > > >
> > > > >
> > > > > On Wed, Aug 14, 2019 at 3:41 AM Priya Arora <pr...@smartshore.nl>
> > > wrote:
> > > > >
> > > > > > Hi Karl,
> > > > > >
> > > > > > Manifold CF logs hints out me an error like :
> > > > > > agents process ran out of memory - shutting down
> > > > > > java.lang.OutOfMemoryError: Java heap space
> > > > > >
> > > > > > Also I have -Xms1024m ,-Xmx1024m memory allocated in
> > > > > > start-options.env.unix, start-options.env.win file.
> > > > > > Also Configuration:-
> > > > > > 1) For Crawler server - 16 GB RAM and 8-Core Intel(R) Xeon(R) CPU
> > > > E5-2660
> > > > > > v3 @ 2.60GHz and
> > > > > >
> > > > > > 2) For Elasticsearch server - 48GB and 1-Core Intel(R) Xeon(R)
> CPU
> > > > > E5-2660
> > > > > > v3 @ 2.60GHz and i am using postgres as database.
> > > > > >
> > > > > > Can you please help me out, what to do in this case.
> > > > > >
> > > > > > Thanks
> > > > > > Priya
> > > > > >
> > > > > >
> > > > > > On Wed, Aug 14, 2019 at 12:33 PM Karl Wright <daddy...@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > The error occurs, I believe, as the result of basic connection
> > > > > problems,
> > > > > > > e.g. the connection is getting rejected.  You can find more
> > > > information
> > > > > > in
> > > > > > > the simple history, and in the manifoldcf log.
> > > > > > >
> > > > > > > I would like to know the underlying cause, since the connector
> > > should
> > > > > be
> > > > > > > resilient against errors of this kind.
> > > > > > >
> > > > > > > Karl
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Aug 14, 2019, 1:46 AM Priya Arora <pr...@smartshore.nl
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Karl,
> > > > > > > >
> > > > > > > > I have an web Repository connector(Seeds:- an intranet
> Site).,
> > > and
> > > > > job
> > > > > > i
> > > > > > > > son Production server.
> > > > > > > >
> > > > > > > > When i ran job on PROD, the job stops itself 2 times with and
> > > > > > > error:Error:
> > > > > > > > Unexpected HTTP result code: -1: null.
> > > > > > > >
> > > > > > > >
> > > > > > > > Can you please provide me an idea, in which it happens so?
> > > > > > > >
> > > > > > > > Thanks and regards
> > > > > > > > Priya Arora
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to