Author: fabien Email: [email protected] Message: Thanks for your quick answer.
I tried to add the NoIndexIf but i cannot get it to work. I used the indexer.conf default file, and added the two following lines at the end of that file : Server http://www.wearethelous.com/feed/ NoIndexIf Content-Type application/rss+xml I got the following log : [71598]{--} Clearing [71598]{--} Clearing done 0.01 [71600]{--} indexer from mnogosearch-3.4.1-mysql-pqsql started with '/etc/mnogosearch/indexer.conf' [71600]{01} URL: http://www.wearethelous.com/feed/ [71600]{01} Server Path Allow 'http://www.wearethelous.com/feed/' [71600]{01} Allow by default [71600]{01} ROBOTS: http://www.wearethelous.com/robots.txt [71600]{01} Request.Accept-Encoding: gzip,deflate,compress [71600]{01} Request.Host: www.wearethelous.com [71600]{01} Request.User-Agent: MnoGoSearch/3.4.1 [71600]{01} Response.Connection: close [71600]{01} Response.Content-Encoding: gzip [71600]{01} Response.Content-Length: 67 [71600]{01} Response.Content-Type: text/plain [71600]{01} Response.Date: Wed, 12 Oct 2016 20:42:46 GMT [71600]{01} Response.Link: <http://www.wearethelous.com/wp-json/>; rel="https://api.w.org/" [71600]{01} Response.ResponseLine: HTTP/1.1 200 OK [71600]{01} Response.ResponseSize: 475 [71600]{01} Response.ResponseTime: 2261 [71600]{01} Response.Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 OpenSSL/1.0.1e-fips mod_bwlimited/1.4 [71600]{01} Response.Server-Charset: utf-8 [71600]{01} Response.Status: 200 [71600]{01} Response.URL: http://www.wearethelous.com/robots.txt [71600]{01} Response.URL_ID: 1928115922 [71600]{01} Response.Vary: Accept-Encoding,User-Agent [71600]{01} Response.X-Powered-By: PHP/5.5.29 [71600]{01} Response.X-Robots-Tag: noindex, follow [71600]{01} Request.Accept-Encoding: gzip,deflate,compress [71600]{01} Request.Host: www.wearethelous.com [71600]{01} Request.User-Agent: MnoGoSearch/3.4.1 [71600]{01} Response.body: [71600]{01} Response.Charset: [71600]{01} Response.Connection: close [71600]{01} Response.Content-Encoding: gzip [71600]{01} Response.Content-Language: [71600]{01} Response.Content-Length: 2337 [71600]{01} Response.Content-Type: application/rss+xml [71600]{01} Response.crc32: 0 [71600]{01} Response.crc32old: 0 [71600]{01} Response.Date: Wed, 12 Oct 2016 20:42:48 GMT [71600]{01} Response.ETag: "7059155a990290887650add31475f88e" [71600]{01} Response.Hops: 0 [71600]{01} Response.ID: 5 [71600]{01} Response.ilinktext: [71600]{01} Response.Last-Modified: Thu, 29 Sep 2016 12:48:50 GMT [71600]{01} Response.Link: <http://www.wearethelous.com/wp-json/>; rel="https://api.w.org/" [71600]{01} Response.MaxDocPerSite: 0 [71600]{01} Response.MaxHops: 256 [71600]{01} Response.meta.description: [71600]{01} Response.meta.keywords: [71600]{01} Response.msg.from: [71600]{01} Response.msg.subject: [71600]{01} Response.msg.to: [71600]{01} Response.PrevStatus: 0 [71600]{01} Response.ResponseLine: HTTP/1.1 200 OK [71600]{01} Response.ResponseSize: 2842 [71600]{01} Response.ResponseTime: 1455 [71600]{01} Response.Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 OpenSSL/1.0.1e-fips mod_bwlimited/1.4 [71600]{01} Response.Server-Charset: utf-8 [71600]{01} Response.Server_id: -2050898686 [71600]{01} Response.Status: 200 [71600]{01} Response.title: [71600]{01} Response.URL: http://www.wearethelous.com/feed/ [71600]{01} Response.url.file: [71600]{01} Response.url.host: [71600]{01} Response.url.path: [71600]{01} Response.url.proto: [71600]{01} Response.URL_ID: -2050898686 [71600]{01} Response.Vary: Accept-Encoding,User-Agent [71600]{01} Response.X-Powered-By: PHP/5.5.29 [71600]{01} Response.X-Robots-Tag: noindex, follow [71600]{01} Status: 200 OK [71600]{01} Guesser: Lang: , Charset: utf-8 [71600]{01} SectionFilter: NoIndexIf Match Wild Insensitive 'Content-Type' 'application/rss+xml' [71600]{01} Flushing word cache [71600]{01} Flushing word cache done 0.00 [71600]{01} Done (4 seconds, 1 documents, 2842 bytes, 0.69 Kbytes/sec.) I see that the section filter talks about the NoIndexIf filter that i added, but the url is still indexed. So what can be wrong ? Thanks in advance for your help. Fabien. > Hi, > > > Hi all, > > > > Is it possible to exclude certain mime types such as rss feeds ? > > > > This can be done using the NoIndexIf command: > > http://www.mnogosearch.org/doc34/msearch-cmdref-noindexif.html > > Put this command into indexer.conf to disallow a certain Content-Type: > > NoIndexIf Content-Type application/rss+xml > > > Another option is to use NoIndexIf in a combination with a user defined > section, to check raw content fragments: > > http://www.mnogosearch.org/doc34/msearch-cmdref-section.html#cmdref-section-user-defined > > The idea is to define a user section using a regex pattern to catch some > known RSS text fragments, and then use NoIndexIf with this section. > > > > Thanks in advance, > > Fabien. > Reply: <http://www.mnogosearch.org/board/message.php?id=21790> _______________________________________________ General mailing list [email protected] http://lists.mnogosearch.org/listinfo/general
