Hi, I just wanted to verify that this is not just a simple slip up on my behalf before I submitted these problems as bugs. I am using mnoGoSearch 3.2.3, mysql build, to index and search one web site. I am using the storage mode crc-multi.
The first problem has to do with a possible problem how mnoGS deals with the robots.txt file. I have turned off robots in the configuration file, as well as in my entry in the Sever Table of the site I am indexing. To be clear, my robots file prohibits indexing of a directory called "archives." When indexing the site, it appears that the entire site is being indexed. When searching, results for the "archives" directory can not to be found. When I examined the URL table, the "archives" directory seems to be fully intact. Why is it then that running a search with search.cgi it returns no results for this directory? I have verified that search.htm is properly configured. The second problem has to do with how mnoGS indexes a web site. When indexing, mnoGS correctly follows and indexes all links found on the web site. But it also looks for "hidden" pages to index. mnoGS will index the URL http://site/dir/file.html after it finds a link to said URL on (for argument sake) http://site/file.html This is all good. But it seems that mnoGS also will then index http://site/dir/ even though there is no link to this file. If http://site/dir/ does not have an index file, Apache will generate a listing of the available files and directories. For argument sake, let say that the Apache index of http://site/dir/ lists off a directory called "_notes" (a common work directory generated by DreamWeaver). The "_notes" sub directory is then indexed, and then the index is filled with garbage data. I am guessing that this perhaps is a problem with how I have mnoGS configured. How can I ensure the mnoGS will stick with indexing only files it finds links to? Thanks for your time, Michael Caplan Institute for Social Ecology http://www.social-ecology.org/ 1118 Maple Hill Road Plainfield, VT, 05667 USA ___________________________________________ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
