Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
Hello,

> Hello,
> 
> I would like to crawl the whole html code for each url.

Perhaps cached copy is what you're looking for.
In 3.4.x cached copies are stored in a separate table "cachedcopy".
Cached copies are compressed by default, but compression can
be switched off:

http://www.mnogosearch.org/doc34/msearch-cmdref-cachedcopyencoding.html


> 
> Is there anyway to do this ?
> 
> I've tried this in the indexer.conf but it doesn't work :
> 
> Section headhtml               25 2058 "<head([^>]*)>(*.)</head>" $2
> Section bodyhtml               26 2058 "<body([^>]*)>(*.)</body>" $2
> Section htmlcode                25 2058 "<html([^>]*)>(*.)</html>" $2
> 
> Section body                    1       2018    afterheaders    html
> gets the body but with all htlm tags stripped out :(
> 
> 
> Thank you for your help
> 

Reply: <http://www.mnogosearch.org/board/message.php?id=21773>

_______________________________________________
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

Reply via email to