Michael Caplan wrote: > Hi, > > Why is it that in the below config file, the Disallow's are ignored when the > indexer crawls the web sites?
Which mnogosearch version do you use? >Should the Dissallow's be placed elsewhere in > the configuration file so that they are honoured? Allow/Disallow commands are processed in the order of their appearance in indexer.conf. The used place seems to be OK. Run indexer with -v6 command line argument. It will explain why it acceptes or rejects links, displaying Allow/Disallow command being choosed for each link. > > > DBAddr mysql://xxx:xxx@localhost/search/ > DBMode multi > VarDir /usr/local/mnogosearch/var > StopwordFile stopwords/en.huge.sl > MinWordLength 2 > MaxDocSize 1548576 > DeleteNoServer yes > Disallow *.b *.sh *.md5 *.rpm > Disallow *.arj *.tar *.zip *.tgz *.gz *.z *.bz2 > Disallow *.lha *.lzh *.rar *.zoo *.ha *.tar.Z > Disallow *.gif *.jpg *.jpeg *.bmp *.tiff *.tif *.xpm *.xbm *.pcx > Disallow *.vdo *.mpeg *.mpe *.mpg *.avi *.movie *.mov *.dat > Disallow *.mid *.mp3 *.rm *.ram *.wav *.aiff *.ra > Disallow *.vrml *.wrl *.png > Disallow *.exe *.com *.cab *.dll *.bin *.class *.ex_ > Disallow *.tex *.texi *.texinfo > Disallow *.cdf *.ps > Disallow *.ai *.eps *.ppt *.hqx > Disallow *.cpt *.bms *.oda *.tcl > Disallow *.o *.a *.la *.so *.log *.LOG *.js > Disallow *.pat *.pm *.m4 *.am *.css > Disallow *.map *.aif *.sit *.sea > Disallow *.m3u *.qt *.mov *.rdf > Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D > Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$ > Disallow */_notes* > Disallow */login/* > Disallow */images/* > Disallow */internal/* > Disallow */forums/* > Disallow */wwwthreads/* > Disallow */ubbthreads/* > Disallow *links* > Disallow *mojo.cgi* > Disallow *_print.html > Disallow */archives/* > Disallow */phpweblog/print.php* > Disallow */phpweblog/friend.php* > Disallow */phpweblog/search.php* > Disallow */phpweblog/contrib.php > Disallow */phpweblog/profiles.php?Author=* > Disallow */daver/gen/js/d0000/* > Disallow */daver/gen/js/d0001/* > Disallow */daver/gen/js/d0002/* > Disallow */daver/gen/js/d0003/* > Disallow */daver/gen/js/d0004/* > Disallow */daver/gen/js/d0005/* > Disallow */daver/gen/js/index/* > Disallow */tmp/* > Disallow */cyberworld/map/* > Disallow */ise/* > Disallow */tank/* > HrefOnly */phpweblog/stories.php?topic=* > HrefOnly */phpweblog/stories.php?page=* > HrefOnly */phpweblog/archive.php* > > AddType image/x-xpixmap *.xpm > AddType image/x-xbitmap *.xbm > AddType image/gif *.gif > AddType text/plain *.txt *.pl *.js *.h *.c *.pm *.e > AddType text/html *.html *.htm *.php *.php3 *.phtml > *.php$ > AddType text/rtf *.rtf > AddType application/pdf *.pdf > AddType application/msword *.doc > AddType application/vnd.ms-excel *.xls > AddType text/x-postscript *.ps > Mime application/msword text/plain "/usr/home/ise/bin/bin/catdoc $1" > Mime application/pdf text/plain > "/usr/home/ise/bin/bin/pdftotext $1 -" > Mime application/vnd.ms-excel text/plain > "/usr/home/ise/bin/bin/xls2csv $1" > Mime "text/rtf*" text/html > "/usr/home/ise/bin/bin/rtf2html $1" > > Period 10d > DefaultLang en > MaxHops 1000 > MaxNetErrors 32 > ReadTimeOut 90s > DocTimeOut 1m30s > NetErrorDelayTime 1d > Robots yes > DetectClones yes > > Section body 1 > Section title 2 > Section description 3 > Section keywords 4 > Section url:file 5 > Section url:path 0 > Section url:host 6 > Section url:proto 0 > Section crosswords 7 > > DeleteBad yes > Index yes > Follow site > > Server site http://www.social-ecology.org/ > Server site http://www.eggplant.ws/ > Server site http://www.infoshop.org/ > Server site http://www.whitecleats.org/ > Server site http://www.anarchosyndicalism.org/ > Server site http://www.leftgreen.org/ > Server site http://www.struggle.ws/ > Server site http://www.houstonabc.org/ > Server site http://www.abolishthebank.org/ > Server site http://www.homedistiller.org/ > Server site http://flag.blackened.net/nefac/ > Server site http://flag.blackened.net/agony/ > Server site http://flag.blackened.net/anarpics/ > Server site http://flag.blackened.net/antinat/ > Server site http://flag.blackened.net/asr/ > Server site http://flag.blackened.net/aca/ > Server site http://flag.blackened.net/blackflag/ > Server site http://flag.blackened.net/biblioteca/ > Server site http://flag.blackened.net/daver/ > Server site http://flag.blackened.net/global/ > Server site http://flag.blackened.net/heatwave/ > Server site http://flag.blackened.net/ias/ > Server site http://flag.blackened.net/kara/ > Server site http://flag.blackened.net/ksl/ > Server site http://flag.blackened.net/liberty/ > Server site http://flag.blackened.net/library/ > Server site http://flag.blackened.net/noterror/ > Server site http://flag.blackened.net/nf/ > Server site http://flag.blackened.net/pdg/ > Server site http://flag.blackened.net/strider/ > Server site http://flag.blackened.net/tolstoy/ > Server site http://flag.blackened.net/wwa/ > Server site http://flag.blackened.net/vrf/ > > ___________________________________________ > If you want to unsubscribe send "unsubscribe general" > to [EMAIL PROTECTED] > > > > ___________________________________________ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
