"Do you agree that a well-formed URL is what java.net.URL will accept in the constructor's argument? Then www.example.org will fail, but http://www.example.org (without a trailing slash) will pass."
I might even go a bit further. See the following code in: WebcrawlerConnector: protected String makeDocumentIdentifier(String parentIdentifier, String rawURL, DocumentURLFilter filter) Thanks! Karl On Fri, Mar 16, 2012 at 5:52 AM, Erlend Garåsen <[email protected]> wrote: > On 15.03.12 19.30, Karl Wright wrote: >> >> A seed can be a specific html file so complaining about a trailing >> slash would make that not work. For example: >> >> http://hello.world.com/startpage.html > > > I think I was a little bit unclear in my recent email. By a trailing slash, > I was thinking more about the domain name itself, e.g. www.example.org/. > > I will create a Jira ticket now, but I will only focus about well-formed > URLs in the seeds list. > > Do you agree that a well-formed URL is what java.net.URL will accept in the > constructor's argument? Then www.example.org will fail, but > http://www.example.org (without a trailing slash) will pass. > > > Erlend > > -- > Erlend Garåsen > Center for Information Technology Services > University of Oslo > P.O. Box 1086 Blindern, N-0317 OSLO, Norway > Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
