[General] Webboard: Index full html code in DDB

2016-05-30 Thread bar
Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
Hello,

> Hello,
> 
> I would like to crawl the whole html code for each url.

Perhaps cached copy is what you're looking for.
In 3.4.x cached copies are stored in a separate table "cachedcopy".
Cached copies are compressed by default, but compression can
be switched off:

http://www.mnogosearch.org/doc34/msearch-cmdref-cachedcopyencoding.html


> 
> Is there anyway to do this ?
> 
> I've tried this in the indexer.conf but it doesn't work :
> 
> Section headhtml   25 2058 "]*)>(*.)" $2
> Section bodyhtml   26 2058 "]*)>(*.)" $2
> Section htmlcode25 2058 "]*)>(*.)" $2
> 
> Section body1   2018afterheadershtml
> gets the body but with all htlm tags stripped out :(
> 
> 
> Thank you for your help
> 

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general


[General] Webboard: Index full html code in DDB

2016-05-30 Thread bar
Author: rafikCyc
Email: rafikothm...@gmail.com
Message:
Hello,

I would like to crawl the whole html code for each url.

Is there anyway to do this ?

I've tried this in the indexer.conf but it doesn't work :

Section headhtml   25 2058 "]*)>(*.)" $2
Section bodyhtml   26 2058 "]*)>(*.)" $2
Section htmlcode25 2058 "]*)>(*.)" $2

Section body1   2018afterheadershtml
gets the body but with all htlm tags stripped out :(


Thank you for your help


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general