Please Find below error stack trace:- ERROR: agents process ran out of memory - shutting down java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.HashMap.newNode(HashMap.java:1750) at java.util.HashMap.putVal(HashMap.java:631) at java.util.HashMap.put(HashMap.java:612) at org.apache.manifoldcf.connectorcommon.fuzzyml.HTMLParseState.noteTag(
HTMLParseState.java:51) at org.apache.manifoldcf.connectorcommon.fuzzyml.TagParseState.dealWithC haracter(TagParseState.java:638) at org.apache.manifoldcf.connectorcommon.fuzzyml.SingleCharacterReceiver .dealWithCharacters(SingleCharacterReceiver.java:51) at org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.de alWithBytes(DecodingByteReceiver.java:48) at org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithoutChar setDetection(Parser.java:99) at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnect or.handleHTML(WebcrawlerConnector.java:4918) at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnect or.extractLinks(WebcrawlerConnector.java:3852) at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnect or.processDocuments(WebcrawlerConnector.java:747) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.ja va:399) agents process ran out of memory - shutting down java.lang.OutOfMemoryError: GC overhead limit exceeded agents process ran out of memory - shutting down java.lang.OutOfMemoryError: GC overhead limit exceeded at java.nio.ByteBuffer.wrap(ByteBuffer.java:373) at java.nio.ByteBuffer.wrap(ByteBuffer.java:396) at org.apache.commons.compress.archivers.zip.ZipFile.resolveLocalFileHea derData(ZipFile.java:1059) at org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java :296) at org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java :218) at org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java :201) at org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java :162) at org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipCon tainerDetector.java:241) at org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipCo ntainerDetector.java:173) at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDe tector.java:110) at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.jav a:84) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1 16) at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:7 2) at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbed ded(ParsingEmbeddedDocumentExtractor.java:102) at org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.jav a:350) at org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:287 ) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280 ) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280 ) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1 43) at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:7 2) at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbed ded(ParsingEmbeddedDocumentExtractor.java:102) at org.apache.tika.parser.pkg.CompressorParser.parse(CompressorParser.ja va:280) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280 ) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280 ) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1 43) at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(Tik aParser.java:74) at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrR eplaceDocumentWithException(TikaExtractor.java:235) at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$Pi pelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3 226) at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$Pi pelineAddFanout.sendDocument(IncrementalIngester.java:3077) at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$Pi pelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.j ava:2708) at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.do cumentIngest(IncrementalIngester.java:756) at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ing estDocumentWithException(WorkerThread.java:1583) agents process ran out of memory - shutting down java.lang.OutOfMemoryError: GC overhead limit exceeded agents process ran out of memory - shutting down java.lang.OutOfMemoryError: GC overhead limit exceeded [Thread-491] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConne ctor@3a4621bd{HTTP/1.1}{0.0.0.0:8345} agents process ran out of memory - shutting down java.lang.OutOfMemoryError: GC overhead limit exceeded agents process ran out of memory - shutting down java.lang.OutOfMemoryError: GC overhead limit exceeded agents process ran out of memory - shutting down java.lang.OutOfMemoryError: GC overhead limit exceeded [Thread-491] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e. j.w.WebAppContext@6a57ae10{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api -service.war-_mcf-api-service-any-3323783172971878700.dir/webapp/,UNAVAILABLE}{/ usr/share/manifoldcf/example/./../web/war/mcf-api-service.war} agents process ran out of memory - shutting down java.lang.OutOfMemoryError: GC overhead limit exceeded agents process ran out of memory - shutting down java.lang.OutOfMemoryError: GC overhead limit exceeded [Thread-491] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e. j.w.WebAppContext@51c693d{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mc f-authority-service.war-_mcf-authority-service-any-3706951886687463454.dir/webap p/,UNAVAILABLE}{/usr/share/manifoldcf/example/./../web/war/mcf-authority-service .war} On Fri, Aug 16, 2019 at 3:22 PM Karl Wright <daddy...@gmail.com> wrote: > Without an out-of-memory stack trace, I cannot definitively point to Tika > or say that it's a specific kind of file. Please send one. > > Karl > > > On Fri, Aug 16, 2019 at 2:09 AM Priya Arora <pr...@smartshore.nl> wrote: > > > *Existing Threads/connections configuration is :-* > > > > How many worker threads do you have? - 15 worker threads has been > > allocated(in properties.xml file). > > And the Tika Extractor connections -10 connections are defined. > > > > Is this suggested to reduce the number more. > > If not, what else can be a solution > > > > Thanks > > Priya > > > > > > > > On Wed, Aug 14, 2019 at 5:32 PM Karl Wright <daddy...@gmail.com> wrote: > > > > > How many worker threads do you have? > > > Even if each worker thread is constrained in memory, and they should > be, > > > you can easily cause things to run out of memory by giving too many > > worker > > > threads. Another way to keep Tika's usage constrained would be to > reduce > > > the number of Tika Extractor connections, because that effectively > limits > > > the number of extractions that can be going on at the same time. > > > > > > Karl > > > > > > > > > On Wed, Aug 14, 2019 at 7:23 AM Priya Arora <pr...@smartshore.nl> > wrote: > > > > > > > Yes , I am using Tika Extractor. And the version used for manifold is > > > 2.13. > > > > Also I am using postgres as database. > > > > > > > > I have 4 types of jobs > > > > One is accessing/re crawling data from a public site. Other three are > > > > accessing intranet site. > > > > Out of which two are giving me correct output-without any error and > > third > > > > one which is having data more than the other two , and giving me > this > > > > error. > > > > > > > > Is there any possibility with site accessibility issue. Can you > please > > > > suggest some solution > > > > Thanks and regards > > > > Priya > > > > > > > > On Wed, Aug 14, 2019 at 3:11 PM Karl Wright <daddy...@gmail.com> > > wrote: > > > > > > > > > I will need to know more. Do you have the tika extractor in your > > > > > pipeline? If so, what version of ManifoldCF are you using? Tika > has > > > had > > > > > bugs related to memory consumption in the past; the out of memory > > > > exception > > > > > may be coming from it and therefore a stack trace is critical to > > have. > > > > > > > > > > Alternatively, you can upgrade to the latest version of MCF (2.13) > > and > > > > that > > > > > has a newer version of Tika without those problem. But you may > need > > to > > > > get > > > > > the agents process more memory. > > > > > > > > > > Another possible cause is that you're using hsqldb in production. > > > HSQLDB > > > > > keeps all of its tables in memory. If you have a large crawl, you > do > > > not > > > > > want to use HSQLDB. > > > > > > > > > > Thanks, > > > > > Karl > > > > > > > > > > > > > > > On Wed, Aug 14, 2019 at 3:41 AM Priya Arora <pr...@smartshore.nl> > > > wrote: > > > > > > > > > > > Hi Karl, > > > > > > > > > > > > Manifold CF logs hints out me an error like : > > > > > > agents process ran out of memory - shutting down > > > > > > java.lang.OutOfMemoryError: Java heap space > > > > > > > > > > > > Also I have -Xms1024m ,-Xmx1024m memory allocated in > > > > > > start-options.env.unix, start-options.env.win file. > > > > > > Also Configuration:- > > > > > > 1) For Crawler server - 16 GB RAM and 8-Core Intel(R) Xeon(R) CPU > > > > E5-2660 > > > > > > v3 @ 2.60GHz and > > > > > > > > > > > > 2) For Elasticsearch server - 48GB and 1-Core Intel(R) Xeon(R) > CPU > > > > > E5-2660 > > > > > > v3 @ 2.60GHz and i am using postgres as database. > > > > > > > > > > > > Can you please help me out, what to do in this case. > > > > > > > > > > > > Thanks > > > > > > Priya > > > > > > > > > > > > > > > > > > On Wed, Aug 14, 2019 at 12:33 PM Karl Wright <daddy...@gmail.com > > > > > > wrote: > > > > > > > > > > > > > The error occurs, I believe, as the result of basic connection > > > > > problems, > > > > > > > e.g. the connection is getting rejected. You can find more > > > > information > > > > > > in > > > > > > > the simple history, and in the manifoldcf log. > > > > > > > > > > > > > > I would like to know the underlying cause, since the connector > > > should > > > > > be > > > > > > > resilient against errors of this kind. > > > > > > > > > > > > > > Karl > > > > > > > > > > > > > > > > > > > > > On Wed, Aug 14, 2019, 1:46 AM Priya Arora <pr...@smartshore.nl > > > > > > wrote: > > > > > > > > > > > > > > > Hi Karl, > > > > > > > > > > > > > > > > I have an web Repository connector(Seeds:- an intranet > Site)., > > > and > > > > > job > > > > > > i > > > > > > > > son Production server. > > > > > > > > > > > > > > > > When i ran job on PROD, the job stops itself 2 times with and > > > > > > > error:Error: > > > > > > > > Unexpected HTTP result code: -1: null. > > > > > > > > > > > > > > > > > > > > > > > > Can you please provide me an idea, in which it happens so? > > > > > > > > > > > > > > > > Thanks and regards > > > > > > > > Priya Arora > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >