The issue was my use of regexes in the inclusions list. Oddly enough,
some regexes I used (and verified via
http://myregexp.com/signedJar.html) that should function properly, did not.
However, my crawl is functioning properly, and is only visiting the
appropriate documents.
--mike
On 12/06/2011 02:34 PM, Karl Wright wrote:
On second thought, "illegal seed" can also mean that the seed is
excluded from the crawl due to your inclusion/exclusion regexp lists.
Might want to check that out too.
Karl
On Tue, Dec 6, 2011 at 2:23 PM, Karl Wright<daddy...@gmail.com> wrote:
The URL as stated is fine and is pretty standard. I don't think
there's a problem there, unless you inadvertantly fixed something when
you changed the hostname.
Can you look at the log - there may well be a stack trace, especially
if you have<property name="org.apache.manifoldcf.connectors"
value="DEBUG"/> set. I'd love to see what the trace is.
Karl
On Tue, Dec 6, 2011 at 1:52 PM, Michael Kelleher<mj.kelle...@gmail.com> wrote:
Here is my seed URL (minus the hostname):
https://hostname.com/vwebv/search?searchArg=dvd&searchCode=SALL&searchType=1&recCount=100
I am using a Web Crawler connection that has been tested with the
NullOutputConnector - so I dont think the issue can be here
I am also using the Solr Output Connector - this had been throwing an
Exception till I fixed the core name - this is the first time I have used
this. So, maybe I dont have things configured correct here. However, there
are no exceptions in the log. Also, I am not using authentication at all on
Solr.
I looked at the class:
connectors\webcrawler\connector\src\main\java\org\apache\manifoldcf\crawler\connectors\webcrawler\WebcrawlerConnector.java
and it was not Obvious what the issue is.
Also, in logging.ini - I changed the logging level to DEBUG and restarted
before I tested the crawl, which further obscures the logic to me in
WebcrawlerConnector.java
Is there somewhere else I can set logging levels. I am not sure my change
to logging.ini is having any effect. Also, is there some other test you
might suggest?
thanks.
--mike