Author: Alexander Barkov Email: Message: Hello, > Hello, > > I couldn't find any information on this subject. > As people start using HTTPS, I get more and more problems when crawling with > links that don't use a specific protocol. > > Let's take this example of a link from http://www.example.com/page-a.html : > <a href="//www.example.com/page-b.html">text</a> > > Will be seen as : http://www.example.com/www.example.com/page-b.html > And of course will cause a 404 error. > > Any idea on how to get the right links ? > > Thanks.
The crawler stores full URLs in the database. But you can remove the protocol at search time, using the search template language functionality. In 3.4.x use regex_substr: http://www.mnogosearch.org/doc34/msearch-templates.html#template-functions In 3.3.x use the EREG template operator: http://www.mnogosearch.org/doc33/msearch-templates-oper.html#templates-oper-misc Reply: <http://www.mnogosearch.org/board/message.php?id=21809> _______________________________________________ General mailing list [email protected] http://lists.mnogosearch.org/listinfo/general
