[ 
https://issues.apache.org/jira/browse/CONNECTORS-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641786#comment-13641786
 ] 

Karl Wright commented on CONNECTORS-680:
----------------------------------------

bq. The following does not work either:

You need to conditionally include the /bokstav=G.* part.  Use parens and the ? 
operator.  Or, you can have TWO regexps on separate lines - one for the bokstav 
part, and one for the domain part.

Try out your regexps here: 
http://www.cis.upenn.edu/~matuszek/General/RegexTester/regex-tester.html

                
> "Illegal seed URL" shows up in manifoldcf.log with reg exp entries in 
> "Include in crawl" box
> --------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-680
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-680
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Web connector
>            Reporter: Erlend GarĂ¥sen
>            Assignee: Erlend GarĂ¥sen
>             Fix For: ManifoldCF 1.2
>
>
> The following error shows up in manifoldcf.log if there are regular 
> expression entries in the "Include in crawl" text box for the web crawler job:
> {code}
> WARN 2013-04-25 14:15:07,431 (Startup thread) - WEB: Illegal seed URL 
> 'http://www.ibsen.uio.no/'
> {code}
> This has nothing to do with using a trailing slash or not, the error shows up 
> even thought the seed URLs are entered correctly.
> The entry in the "Include in crawl" box used to trigger this error was:
> {code}
> .*bokstav=G.*
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to