[
https://issues.apache.org/jira/browse/CONNECTORS-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641786#comment-13641786
]
Karl Wright commented on CONNECTORS-680:
----------------------------------------
bq. The following does not work either:
You need to conditionally include the /bokstav=G.* part. Use parens and the ?
operator. Or, you can have TWO regexps on separate lines - one for the bokstav
part, and one for the domain part.
Try out your regexps here:
http://www.cis.upenn.edu/~matuszek/General/RegexTester/regex-tester.html
> "Illegal seed URL" shows up in manifoldcf.log with reg exp entries in
> "Include in crawl" box
> --------------------------------------------------------------------------------------------
>
> Key: CONNECTORS-680
> URL: https://issues.apache.org/jira/browse/CONNECTORS-680
> Project: ManifoldCF
> Issue Type: Bug
> Components: Web connector
> Reporter: Erlend GarĂ¥sen
> Assignee: Erlend GarĂ¥sen
> Fix For: ManifoldCF 1.2
>
>
> The following error shows up in manifoldcf.log if there are regular
> expression entries in the "Include in crawl" text box for the web crawler job:
> {code}
> WARN 2013-04-25 14:15:07,431 (Startup thread) - WEB: Illegal seed URL
> 'http://www.ibsen.uio.no/'
> {code}
> This has nothing to do with using a trailing slash or not, the error shows up
> even thought the seed URLs are entered correctly.
> The entry in the "Include in crawl" box used to trigger this error was:
> {code}
> .*bokstav=G.*
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira