Author: fabien Email: [email protected] Message: And to be more precise, i finally want to index only html pages and not all other types of data (css/js/pictures/pdf/rss/...) .
Fabien. > Thanks for your quick answer. > > I tried to add the NoIndexIf but i cannot get it to work. > > I used the indexer.conf default file, and added the two following lines at > the end of that file : > Server http://www.wearethelous.com/feed/ > NoIndexIf Content-Type application/rss+xml > > I got the following log : > > [71598]{--} Clearing > [71598]{--} Clearing done 0.01 > [71600]{--} indexer from mnogosearch-3.4.1-mysql-pqsql started with > '/etc/mnogosearch/indexer.conf' > [71600]{01} URL: http://www.wearethelous.com/feed/ > [71600]{01} Server Path Allow 'http://www.wearethelous.com/feed/' > [71600]{01} Allow by default > [71600]{01} ROBOTS: http://www.wearethelous.com/robots.txt > [71600]{01} Request.Accept-Encoding: gzip,deflate,compress > [71600]{01} Request.Host: www.wearethelous.com > [71600]{01} Request.User-Agent: MnoGoSearch/3.4.1 > [71600]{01} Response.Connection: close > [71600]{01} Response.Content-Encoding: gzip > [71600]{01} Response.Content-Length: 67 > [71600]{01} Response.Content-Type: text/plain > [71600]{01} Response.Date: Wed, 12 Oct 2016 20:42:46 GMT > [71600]{01} Response.Link: <http://www.wearethelous.com/wp-json/>; > rel="https://api.w.org/" > [71600]{01} Response.ResponseLine: HTTP/1.1 200 OK > [71600]{01} Response.ResponseSize: 475 > [71600]{01} Response.ResponseTime: 2261 > [71600]{01} Response.Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 > OpenSSL/1.0.1e-fips mod_bwlimited/1.4 > [71600]{01} Response.Server-Charset: utf-8 > [71600]{01} Response.Status: 200 > [71600]{01} Response.URL: http://www.wearethelous.com/robots.txt > [71600]{01} Response.URL_ID: 1928115922 > [71600]{01} Response.Vary: Accept-Encoding,User-Agent > [71600]{01} Response.X-Powered-By: PHP/5.5.29 > [71600]{01} Response.X-Robots-Tag: noindex, follow > [71600]{01} Request.Accept-Encoding: gzip,deflate,compress > [71600]{01} Request.Host: www.wearethelous.com > [71600]{01} Request.User-Agent: MnoGoSearch/3.4.1 > [71600]{01} Response.body: > [71600]{01} Response.Charset: > [71600]{01} Response.Connection: close > [71600]{01} Response.Content-Encoding: gzip > [71600]{01} Response.Content-Language: > [71600]{01} Response.Content-Length: 2337 > [71600]{01} Response.Content-Type: application/rss+xml > [71600]{01} Response.crc32: 0 > [71600]{01} Response.crc32old: 0 > [71600]{01} Response.Date: Wed, 12 Oct 2016 20:42:48 GMT > [71600]{01} Response.ETag: "7059155a990290887650add31475f88e" > [71600]{01} Response.Hops: 0 > [71600]{01} Response.ID: 5 > [71600]{01} Response.ilinktext: > [71600]{01} Response.Last-Modified: Thu, 29 Sep 2016 12:48:50 GMT > [71600]{01} Response.Link: <http://www.wearethelous.com/wp-json/>; > rel="https://api.w.org/" > [71600]{01} Response.MaxDocPerSite: 0 > [71600]{01} Response.MaxHops: 256 > [71600]{01} Response.meta.description: > [71600]{01} Response.meta.keywords: > [71600]{01} Response.msg.from: > [71600]{01} Response.msg.subject: > [71600]{01} Response.msg.to: > [71600]{01} Response.PrevStatus: 0 > [71600]{01} Response.ResponseLine: HTTP/1.1 200 OK > [71600]{01} Response.ResponseSize: 2842 > [71600]{01} Response.ResponseTime: 1455 > [71600]{01} Response.Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 > OpenSSL/1.0.1e-fips mod_bwlimited/1.4 > [71600]{01} Response.Server-Charset: utf-8 > [71600]{01} Response.Server_id: -2050898686 > [71600]{01} Response.Status: 200 > [71600]{01} Response.title: > [71600]{01} Response.URL: http://www.wearethelous.com/feed/ > [71600]{01} Response.url.file: > [71600]{01} Response.url.host: > [71600]{01} Response.url.path: > [71600]{01} Response.url.proto: > [71600]{01} Response.URL_ID: -2050898686 > [71600]{01} Response.Vary: Accept-Encoding,User-Agent > [71600]{01} Response.X-Powered-By: PHP/5.5.29 > [71600]{01} Response.X-Robots-Tag: noindex, follow > [71600]{01} Status: 200 OK > [71600]{01} Guesser: Lang: , Charset: utf-8 > [71600]{01} SectionFilter: NoIndexIf Match Wild Insensitive 'Content-Type' > 'application/rss+xml' > [71600]{01} Flushing word cache > [71600]{01} Flushing word cache done 0.00 > [71600]{01} Done (4 seconds, 1 documents, 2842 bytes, 0.69 Kbytes/sec.) > > I see that the section filter talks about the NoIndexIf filter that i added, > but the url is still indexed. > So what can be wrong ? > > Thanks in advance for your help. > Fabien. > > > > Hi, > > > > > Hi all, > > > > > > Is it possible to exclude certain mime types such as rss feeds ? > > > > > > > This can be done using the NoIndexIf command: > > > > http://www.mnogosearch.org/doc34/msearch-cmdref-noindexif.html > > > > Put this command into indexer.conf to disallow a certain Content-Type: > > > > NoIndexIf Content-Type application/rss+xml > > > > > > Another option is to use NoIndexIf in a combination with a user defined > > section, to check raw content fragments: > > > > http://www.mnogosearch.org/doc34/msearch-cmdref-section.html#cmdref-section-user-defined > > > > The idea is to define a user section using a regex pattern to catch some > > known RSS text fragments, and then use NoIndexIf with this section. > > > > > > > Thanks in advance, > > > Fabien. > > Reply: <http://www.mnogosearch.org/board/message.php?id=21791> _______________________________________________ General mailing list [email protected] http://lists.mnogosearch.org/listinfo/general
