Author: Alexander Barkov Email: Message: Hello Julien, > > Hello, > > > > > Hello, > > > > > > I couldn't find any information on this subject. > > > As people start using HTTPS, I get more and more problems when crawling > with > > > links that don't use a specific protocol. > > > > > > Let's take this example of a link from http://www.example.com/page-a.html > > > : > > > <a href="//www.example.com/page-b.html">text</a> > > > > > > Will be seen as : http://www.example.com/www.example.com/page-b.html > > > And of course will cause a 404 error. > > > > > > Any idea on how to get the right links ? > > > > > > Thanks. > > > > The crawler stores full URLs in the database. > > But you can remove the protocol at search time, > > using the search template language functionality. > > > > In 3.4.x use regex_substr: > > http://www.mnogosearch.org/doc34/msearch-templates.html#template- > functions > > > > In 3.3.x use the EREG template operator: > > http://www.mnogosearch.org/doc33/msearch-templates- > oper.html#templates-oper-misc > > > > Hello Alexander, > > Thanks for the answer. > However, the problem occurs on the indexing phase : the crawler tries to > index > http://www.example.com/www.example.com/page-b.html (which does not exist) > instead of http://www.example.com/page-b.html > > Can I prevent those 404 errors ? > > Thanks !
Oops. This is not supported yet, indeed. I thought it was. It should be easy to add this. Which version are you using? Reply: <http://www.mnogosearch.org/board/message.php?id=21811> _______________________________________________ General mailing list [email protected] http://lists.mnogosearch.org/listinfo/general
