Author: Julien D.
Email: jul...@clustaar.com
Message:
> Hello  Julien,
> 
> > > Hello,
> > > 
> > > > Hello,
> > > > 
> > > > I couldn't find any information on this subject.
> > > > As people start using HTTPS, I get more and more problems when 
crawling 
> > with 
> > > > links that don't use a specific protocol.
> > > > 
> > > > Let's take this example of a link from http://www.example.com/page-
a.html :
> > > > <a href="//www.example.com/page-b.html">text</a>
> > > > 
> > > > Will be seen as : http://www.example.com/www.example.com/page-
b.html
> > > > And of course will cause a 404 error.
> > > > 
> > > > Any idea on how to get the right links ?
> > > > 
> > > > Thanks.
> > > 
> > > The crawler stores full URLs in the database.
> > > But you can remove the protocol at search time,
> > > using the search template language functionality.
> > > 
> > > In 3.4.x use regex_substr:
> > > http://www.mnogosearch.org/doc34/msearch-templates.html#template-
> > functions
> > > 
> > > In 3.3.x use the EREG template operator:
> > > http://www.mnogosearch.org/doc33/msearch-templates-
> > oper.html#templates-oper-misc
> > > 
> > 
> > Hello Alexander,
> > 
> > Thanks for the answer.
> > However, the problem occurs on the indexing phase : the crawler tries to 
index 
> > http://www.example.com/www.example.com/page-b.html (which does not 
exist) 
> > instead of http://www.example.com/page-b.html
> > 
> > Can I prevent those 404 errors ?
> > 
> > Thanks !
> 
> Oops. This is not supported yet, indeed. I thought it was.
> It should be easy to add this. Which version are you using?
> 

Hello Alexander,

I currently use 3.4.1.

Is there a new release I am not aware of ?

Thank you for your quick answers !

Reply: <http://www.mnogosearch.org/board/message.php?id=21812>

_______________________________________________
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

Reply via email to