Hi,

I just wanted to verify that this is not just a simple slip up on my behalf
before I submitted these problems as bugs.  I am using mnoGoSearch 3.2.3,
mysql build, to index and search one web site.  I am using the storage mode
crc-multi.

The first problem has to do with a possible problem how mnoGS deals with the
robots.txt file.  I have turned off robots in the configuration file, as
well as in my entry in the Sever Table of the site I am indexing.  To be
clear, my robots file prohibits indexing of a directory called "archives."
When indexing the site, it appears that the entire site is being indexed.
When searching, results for the "archives" directory can not to be found.
When I examined the URL table, the "archives" directory seems to be fully
intact.  Why is it then that running a search with search.cgi it returns no
results for this directory?  I have verified that search.htm is properly
configured.

The second problem has to do with how mnoGS indexes a web site.  When
indexing, mnoGS correctly follows and indexes all links found on the web
site.  But it also looks for "hidden" pages to index.  mnoGS will index the
URL http://site/dir/file.html after it finds a link to said URL on (for
argument sake) http://site/file.html  This is all good.  But it seems that
mnoGS also will then index http://site/dir/ even though there is no link to
this file.  If http://site/dir/ does not have an index file, Apache will
generate a listing of the available files and directories.  For argument
sake, let say that the Apache index of http://site/dir/ lists off a
directory called "_notes" (a common work directory generated by
DreamWeaver).  The "_notes" sub directory is then indexed, and then the
index is filled with garbage data.

I am guessing that this perhaps is a problem with how I have mnoGS
configured.  How can I ensure the mnoGS will stick with indexing only files
it finds links to?

Thanks for your time,


Michael Caplan
Institute for Social Ecology
http://www.social-ecology.org/

1118 Maple Hill Road
Plainfield, VT, 05667 USA

___________________________________________
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]

Reply via email to