Author: fabien
Email: fabien.lahau...@gmail.com
Message:
Thanks for your quick answer.

I tried to add the NoIndexIf but i cannot get it to work.

I used the indexer.conf default file, and added the two following lines at the 
end of that file : 
Server http://www.wearethelous.com/feed/
NoIndexIf Content-Type application/rss+xml

I got the following log : 

[71598]{--} Clearing
[71598]{--} Clearing done       0.01
[71600]{--} indexer from mnogosearch-3.4.1-mysql-pqsql started with 
'/etc/mnogosearch/indexer.conf'
[71600]{01} URL: http://www.wearethelous.com/feed/
[71600]{01} Server Path Allow 'http://www.wearethelous.com/feed/'
[71600]{01} Allow by default
[71600]{01} ROBOTS: http://www.wearethelous.com/robots.txt
[71600]{01} Request.Accept-Encoding: gzip,deflate,compress
[71600]{01} Request.Host: www.wearethelous.com
[71600]{01} Request.User-Agent: MnoGoSearch/3.4.1
[71600]{01} Response.Connection: close
[71600]{01} Response.Content-Encoding: gzip
[71600]{01} Response.Content-Length: 67
[71600]{01} Response.Content-Type: text/plain
[71600]{01} Response.Date: Wed, 12 Oct 2016 20:42:46 GMT
[71600]{01} Response.Link: <http://www.wearethelous.com/wp-json/>; 
rel="https://api.w.org/";
[71600]{01} Response.ResponseLine: HTTP/1.1 200 OK
[71600]{01} Response.ResponseSize: 475
[71600]{01} Response.ResponseTime: 2261
[71600]{01} Response.Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 
OpenSSL/1.0.1e-fips mod_bwlimited/1.4
[71600]{01} Response.Server-Charset: utf-8
[71600]{01} Response.Status: 200
[71600]{01} Response.URL: http://www.wearethelous.com/robots.txt
[71600]{01} Response.URL_ID: 1928115922
[71600]{01} Response.Vary: Accept-Encoding,User-Agent
[71600]{01} Response.X-Powered-By: PHP/5.5.29
[71600]{01} Response.X-Robots-Tag: noindex, follow
[71600]{01} Request.Accept-Encoding: gzip,deflate,compress
[71600]{01} Request.Host: www.wearethelous.com
[71600]{01} Request.User-Agent: MnoGoSearch/3.4.1
[71600]{01} Response.body: 
[71600]{01} Response.Charset: 
[71600]{01} Response.Connection: close
[71600]{01} Response.Content-Encoding: gzip
[71600]{01} Response.Content-Language: 
[71600]{01} Response.Content-Length: 2337
[71600]{01} Response.Content-Type: application/rss+xml
[71600]{01} Response.crc32: 0
[71600]{01} Response.crc32old: 0
[71600]{01} Response.Date: Wed, 12 Oct 2016 20:42:48 GMT
[71600]{01} Response.ETag: "7059155a990290887650add31475f88e"
[71600]{01} Response.Hops: 0
[71600]{01} Response.ID: 5
[71600]{01} Response.ilinktext: 
[71600]{01} Response.Last-Modified: Thu, 29 Sep 2016 12:48:50 GMT
[71600]{01} Response.Link: <http://www.wearethelous.com/wp-json/>; 
rel="https://api.w.org/";
[71600]{01} Response.MaxDocPerSite: 0
[71600]{01} Response.MaxHops: 256
[71600]{01} Response.meta.description: 
[71600]{01} Response.meta.keywords: 
[71600]{01} Response.msg.from: 
[71600]{01} Response.msg.subject: 
[71600]{01} Response.msg.to: 
[71600]{01} Response.PrevStatus: 0
[71600]{01} Response.ResponseLine: HTTP/1.1 200 OK
[71600]{01} Response.ResponseSize: 2842
[71600]{01} Response.ResponseTime: 1455
[71600]{01} Response.Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 
OpenSSL/1.0.1e-fips mod_bwlimited/1.4
[71600]{01} Response.Server-Charset: utf-8
[71600]{01} Response.Server_id: -2050898686
[71600]{01} Response.Status: 200
[71600]{01} Response.title: 
[71600]{01} Response.URL: http://www.wearethelous.com/feed/
[71600]{01} Response.url.file: 
[71600]{01} Response.url.host: 
[71600]{01} Response.url.path: 
[71600]{01} Response.url.proto: 
[71600]{01} Response.URL_ID: -2050898686
[71600]{01} Response.Vary: Accept-Encoding,User-Agent
[71600]{01} Response.X-Powered-By: PHP/5.5.29
[71600]{01} Response.X-Robots-Tag: noindex, follow
[71600]{01} Status: 200 OK
[71600]{01} Guesser: Lang: , Charset: utf-8
[71600]{01} SectionFilter: NoIndexIf Match Wild Insensitive 'Content-Type' 
'application/rss+xml'
[71600]{01} Flushing word cache
[71600]{01} Flushing word cache done    0.00
[71600]{01} Done (4 seconds, 1 documents, 2842 bytes,  0.69 Kbytes/sec.)

I see that the section filter talks about the NoIndexIf filter that i added, 
but the url is still indexed.
So what can be wrong ?

Thanks in advance for your help.
Fabien.


> Hi,
> 
> > Hi all,
> > 
> > Is it possible to exclude certain mime types such as rss feeds ?
> > 
> 
> This can be done using the NoIndexIf command:
> 
> http://www.mnogosearch.org/doc34/msearch-cmdref-noindexif.html
> 
> Put this command into indexer.conf to disallow a certain Content-Type:
> 
> NoIndexIf Content-Type application/rss+xml
> 
> 
> Another option is to use NoIndexIf in a combination with a user defined 
> section, to check raw content fragments:
> 
> http://www.mnogosearch.org/doc34/msearch-cmdref-section.html#cmdref-section-user-defined
> 
> The idea is to define a user section using a regex pattern to catch some 
> known RSS text fragments, and then use NoIndexIf with this section.
> 
> 
> > Thanks in advance,
> > Fabien.
> 

Reply: <http://www.mnogosearch.org/board/message.php?id=21790>

_______________________________________________
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

Reply via email to