Author: fabien
Email: fabien.lahau...@gmail.com
Message:
And to be more precise, i finally want to index only html pages and not all 
other types of data (css/js/pictures/pdf/rss/...) .

Fabien.

> Thanks for your quick answer.
> 
> I tried to add the NoIndexIf but i cannot get it to work.
> 
> I used the indexer.conf default file, and added the two following lines at 
> the end of that file : 
> Server http://www.wearethelous.com/feed/
> NoIndexIf Content-Type application/rss+xml
> 
> I got the following log : 
> 
> [71598]{--} Clearing
> [71598]{--} Clearing done       0.01
> [71600]{--} indexer from mnogosearch-3.4.1-mysql-pqsql started with 
> '/etc/mnogosearch/indexer.conf'
> [71600]{01} URL: http://www.wearethelous.com/feed/
> [71600]{01} Server Path Allow 'http://www.wearethelous.com/feed/'
> [71600]{01} Allow by default
> [71600]{01} ROBOTS: http://www.wearethelous.com/robots.txt
> [71600]{01} Request.Accept-Encoding: gzip,deflate,compress
> [71600]{01} Request.Host: www.wearethelous.com
> [71600]{01} Request.User-Agent: MnoGoSearch/3.4.1
> [71600]{01} Response.Connection: close
> [71600]{01} Response.Content-Encoding: gzip
> [71600]{01} Response.Content-Length: 67
> [71600]{01} Response.Content-Type: text/plain
> [71600]{01} Response.Date: Wed, 12 Oct 2016 20:42:46 GMT
> [71600]{01} Response.Link: <http://www.wearethelous.com/wp-json/>; 
> rel="https://api.w.org/";
> [71600]{01} Response.ResponseLine: HTTP/1.1 200 OK
> [71600]{01} Response.ResponseSize: 475
> [71600]{01} Response.ResponseTime: 2261
> [71600]{01} Response.Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 
> OpenSSL/1.0.1e-fips mod_bwlimited/1.4
> [71600]{01} Response.Server-Charset: utf-8
> [71600]{01} Response.Status: 200
> [71600]{01} Response.URL: http://www.wearethelous.com/robots.txt
> [71600]{01} Response.URL_ID: 1928115922
> [71600]{01} Response.Vary: Accept-Encoding,User-Agent
> [71600]{01} Response.X-Powered-By: PHP/5.5.29
> [71600]{01} Response.X-Robots-Tag: noindex, follow
> [71600]{01} Request.Accept-Encoding: gzip,deflate,compress
> [71600]{01} Request.Host: www.wearethelous.com
> [71600]{01} Request.User-Agent: MnoGoSearch/3.4.1
> [71600]{01} Response.body: 
> [71600]{01} Response.Charset: 
> [71600]{01} Response.Connection: close
> [71600]{01} Response.Content-Encoding: gzip
> [71600]{01} Response.Content-Language: 
> [71600]{01} Response.Content-Length: 2337
> [71600]{01} Response.Content-Type: application/rss+xml
> [71600]{01} Response.crc32: 0
> [71600]{01} Response.crc32old: 0
> [71600]{01} Response.Date: Wed, 12 Oct 2016 20:42:48 GMT
> [71600]{01} Response.ETag: "7059155a990290887650add31475f88e"
> [71600]{01} Response.Hops: 0
> [71600]{01} Response.ID: 5
> [71600]{01} Response.ilinktext: 
> [71600]{01} Response.Last-Modified: Thu, 29 Sep 2016 12:48:50 GMT
> [71600]{01} Response.Link: <http://www.wearethelous.com/wp-json/>; 
> rel="https://api.w.org/";
> [71600]{01} Response.MaxDocPerSite: 0
> [71600]{01} Response.MaxHops: 256
> [71600]{01} Response.meta.description: 
> [71600]{01} Response.meta.keywords: 
> [71600]{01} Response.msg.from: 
> [71600]{01} Response.msg.subject: 
> [71600]{01} Response.msg.to: 
> [71600]{01} Response.PrevStatus: 0
> [71600]{01} Response.ResponseLine: HTTP/1.1 200 OK
> [71600]{01} Response.ResponseSize: 2842
> [71600]{01} Response.ResponseTime: 1455
> [71600]{01} Response.Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 
> OpenSSL/1.0.1e-fips mod_bwlimited/1.4
> [71600]{01} Response.Server-Charset: utf-8
> [71600]{01} Response.Server_id: -2050898686
> [71600]{01} Response.Status: 200
> [71600]{01} Response.title: 
> [71600]{01} Response.URL: http://www.wearethelous.com/feed/
> [71600]{01} Response.url.file: 
> [71600]{01} Response.url.host: 
> [71600]{01} Response.url.path: 
> [71600]{01} Response.url.proto: 
> [71600]{01} Response.URL_ID: -2050898686
> [71600]{01} Response.Vary: Accept-Encoding,User-Agent
> [71600]{01} Response.X-Powered-By: PHP/5.5.29
> [71600]{01} Response.X-Robots-Tag: noindex, follow
> [71600]{01} Status: 200 OK
> [71600]{01} Guesser: Lang: , Charset: utf-8
> [71600]{01} SectionFilter: NoIndexIf Match Wild Insensitive 'Content-Type' 
> 'application/rss+xml'
> [71600]{01} Flushing word cache
> [71600]{01} Flushing word cache done    0.00
> [71600]{01} Done (4 seconds, 1 documents, 2842 bytes,  0.69 Kbytes/sec.)
> 
> I see that the section filter talks about the NoIndexIf filter that i added, 
> but the url is still indexed.
> So what can be wrong ?
> 
> Thanks in advance for your help.
> Fabien.
> 
> 
> > Hi,
> > 
> > > Hi all,
> > > 
> > > Is it possible to exclude certain mime types such as rss feeds ?
> > > 
> > 
> > This can be done using the NoIndexIf command:
> > 
> > http://www.mnogosearch.org/doc34/msearch-cmdref-noindexif.html
> > 
> > Put this command into indexer.conf to disallow a certain Content-Type:
> > 
> > NoIndexIf Content-Type application/rss+xml
> > 
> > 
> > Another option is to use NoIndexIf in a combination with a user defined 
> > section, to check raw content fragments:
> > 
> > http://www.mnogosearch.org/doc34/msearch-cmdref-section.html#cmdref-section-user-defined
> > 
> > The idea is to define a user section using a regex pattern to catch some 
> > known RSS text fragments, and then use NoIndexIf with this section.
> > 
> > 
> > > Thanks in advance,
> > > Fabien.
> > 

Reply: <http://www.mnogosearch.org/board/message.php?id=21791>

_______________________________________________
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

Reply via email to