Michael Caplan wrote:

> Hi,
> 
> Why is it that in the below config file, the Disallow's are ignored when the
> indexer crawls the web sites?  


Which mnogosearch version do you use?

>Should the Dissallow's be placed elsewhere in
> the configuration file so that they are honoured?


Allow/Disallow commands are processed in the order of their
appearance in indexer.conf. The used place seems to be OK.

Run indexer with -v6 command line argument. It will explain
why it acceptes or rejects links, displaying Allow/Disallow
command being choosed for each link.


> 
> 
> DBAddr          mysql://xxx:xxx@localhost/search/
> DBMode multi
> VarDir /usr/local/mnogosearch/var
> StopwordFile stopwords/en.huge.sl
> MinWordLength 2
> MaxDocSize 1548576
> DeleteNoServer yes
> Disallow *.b    *.sh   *.md5  *.rpm
> Disallow *.arj  *.tar  *.zip  *.tgz  *.gz   *.z     *.bz2
> Disallow *.lha  *.lzh  *.rar  *.zoo  *.ha   *.tar.Z
> Disallow *.gif  *.jpg  *.jpeg *.bmp  *.tiff *.tif   *.xpm  *.xbm *.pcx
> Disallow *.vdo  *.mpeg *.mpe  *.mpg  *.avi  *.movie *.mov  *.dat
> Disallow *.mid  *.mp3  *.rm   *.ram  *.wav  *.aiff  *.ra
> Disallow *.vrml *.wrl  *.png
> Disallow *.exe  *.com  *.cab  *.dll  *.bin  *.class *.ex_
> Disallow *.tex  *.texi *.texinfo
> Disallow *.cdf  *.ps
> Disallow *.ai   *.eps  *.ppt  *.hqx
> Disallow *.cpt  *.bms  *.oda  *.tcl
> Disallow *.o    *.a    *.la   *.so *.log *.LOG *.js
> Disallow *.pat  *.pm   *.m4   *.am   *.css
> Disallow *.map  *.aif  *.sit  *.sea
> Disallow *.m3u  *.qt   *.mov  *.rdf
> Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D
> Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$
> Disallow */_notes*
> Disallow */login/*
> Disallow */images/*
> Disallow */internal/*
> Disallow */forums/*
> Disallow */wwwthreads/*
> Disallow */ubbthreads/*
> Disallow *links*
> Disallow *mojo.cgi*
> Disallow *_print.html
> Disallow */archives/*
> Disallow */phpweblog/print.php*
> Disallow */phpweblog/friend.php*
> Disallow */phpweblog/search.php*
> Disallow */phpweblog/contrib.php
> Disallow */phpweblog/profiles.php?Author=*
> Disallow */daver/gen/js/d0000/*
> Disallow */daver/gen/js/d0001/*
> Disallow */daver/gen/js/d0002/*
> Disallow */daver/gen/js/d0003/*
> Disallow */daver/gen/js/d0004/*
> Disallow */daver/gen/js/d0005/*
> Disallow */daver/gen/js/index/*
> Disallow */tmp/*
> Disallow */cyberworld/map/*
> Disallow */ise/*
> Disallow */tank/*
> HrefOnly */phpweblog/stories.php?topic=*
> HrefOnly */phpweblog/stories.php?page=*
> HrefOnly */phpweblog/archive.php*
> 
> AddType image/x-xpixmap *.xpm
> AddType image/x-xbitmap *.xbm
> AddType image/gif       *.gif
> AddType text/plain                      *.txt  *.pl *.js *.h *.c *.pm *.e
> AddType text/html                       *.html *.htm *.php *.php3 *.phtml
> *.php$
> AddType text/rtf                        *.rtf
> AddType application/pdf                 *.pdf
> AddType application/msword              *.doc
> AddType application/vnd.ms-excel        *.xls
> AddType text/x-postscript               *.ps
> Mime application/msword      text/plain  "/usr/home/ise/bin/bin/catdoc $1"
> Mime application/pdf          text/plain
> "/usr/home/ise/bin/bin/pdftotext $1 -"
> Mime application/vnd.ms-excel text/plain
> "/usr/home/ise/bin/bin/xls2csv $1"
> Mime "text/rtf*"                text/html
> "/usr/home/ise/bin/bin/rtf2html $1"
> 
> Period 10d
> DefaultLang en
> MaxHops 1000
> MaxNetErrors 32
> ReadTimeOut 90s
> DocTimeOut 1m30s
> NetErrorDelayTime 1d
> Robots yes
> DetectClones yes
> 
> Section body                    1
> Section title                   2
> Section description             3
> Section keywords                4
> Section url:file                5
> Section url:path                0
> Section url:host                6
> Section url:proto               0
> Section crosswords              7
> 
> DeleteBad yes
> Index yes
> Follow site
> 
> Server site http://www.social-ecology.org/
> Server site http://www.eggplant.ws/
> Server site http://www.infoshop.org/
> Server site http://www.whitecleats.org/
> Server site http://www.anarchosyndicalism.org/
> Server site http://www.leftgreen.org/
> Server site http://www.struggle.ws/
> Server site http://www.houstonabc.org/
> Server site http://www.abolishthebank.org/
> Server site http://www.homedistiller.org/
> Server site http://flag.blackened.net/nefac/
> Server site http://flag.blackened.net/agony/
> Server site http://flag.blackened.net/anarpics/
> Server site http://flag.blackened.net/antinat/
> Server site http://flag.blackened.net/asr/
> Server site http://flag.blackened.net/aca/
> Server site http://flag.blackened.net/blackflag/
> Server site http://flag.blackened.net/biblioteca/
> Server site http://flag.blackened.net/daver/
> Server site http://flag.blackened.net/global/
> Server site http://flag.blackened.net/heatwave/
> Server site http://flag.blackened.net/ias/
> Server site http://flag.blackened.net/kara/
> Server site http://flag.blackened.net/ksl/
> Server site http://flag.blackened.net/liberty/
> Server site http://flag.blackened.net/library/
> Server site http://flag.blackened.net/noterror/
> Server site http://flag.blackened.net/nf/
> Server site http://flag.blackened.net/pdg/
> Server site http://flag.blackened.net/strider/
> Server site http://flag.blackened.net/tolstoy/
> Server site http://flag.blackened.net/wwa/
> Server site http://flag.blackened.net/vrf/
> 
> ___________________________________________
> If you want to unsubscribe send "unsubscribe general"
> to [EMAIL PROTECTED]
> 
> 
> 
> 



___________________________________________
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]

Reply via email to