[ https://issues.apache.org/jira/browse/CONNECTORS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826755#comment-16826755 ]
Donald Van den Driessche commented on CONNECTORS-1602: ------------------------------------------------------ Karl The website we're crawling also needs session based login. What happens with cookies in a continuous crawl? > Continuous crawling doesn't recrawl everything > ---------------------------------------------- > > Key: CONNECTORS-1602 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1602 > Project: ManifoldCF > Issue Type: Bug > Components: Web connector > Reporter: Donald Van den Driessche > Priority: Major > > When crawling a website in continuous crawling mode we saw that not all > documents are recrawled. > The site is quite extensive. We figured out that after crawling a > document/page gets a recrawl timestamp in between the recrawl interval and > max recrawl interval. > But if these values occur within the first crawl, Manifold starts recrawling > those, but seems to ignore the rest of the website. Also sometimes documents > get recrawled 5 times while other don't get recrawled. Apparently due to the > same issue. > > Is it possible to shed a bit more light on the continuous crawling? > Is it a good system to use for crawling a (extensive) website? -- This message was sent by Atlassian JIRA (v7.6.3#76005)