Kir, thank you for your reply. In my installation, dlog.log is found in /usr/local/aspseek/var/, not /usr/local/aspseek/var/aspseek12/. The dlog.log file seems to only be updated when a search is done, not when the index is produced. When the index is produced, the file /usr/local/aspseek/var/aspseek12/logs.txt is updated with these lines: New indexing session started at: 1042470091 Got next 96 URLs for: 0.011 seconds. Queued docs: 96.Time 0-1042470091.
This was for the command 'sbin/index -a -m -u "http://www.jhuccp.org/popreporter/%"'. Every time a search is done, var/dlog.log is updated with this line: Subset http://www.jhuccp.org/ not found This is so even though the subset is 'http://www.jhuccp.org/popreporter/%' and not just 'http://www.jhuccp.org'. I've killed and restarted searchd a couple of times without any change to the results. Thank you, again, for any help and suggestions. -Kevin Zembower >>> [EMAIL PROTECTED] 01/13/03 10:12AM >>> Looks you have done everything right. Hmm...could you check searchd's log file /usr/local/aspseek/var/aspseek12/dlog.log for some "Subset not found" messages? Also, as a last resort, try restarting searchd.... KEVIN ZEMBOWER wrote: > I'm trying to restrict the found documents to one's in a particular directory. Our >aspseek search engine is at http://www.jhuccp.org/cgi-bin/s.cgi. > > If you enter a search term like 'advocacy', you should get a return of about 424 >documents. To do this, aspseek uses this URL: > http://www.jhuccp.org/cgi-bin/s.cgi?q=advocacy&cs=&ps=20&o=0 > > We want to limit the found documents to the ones that have 'advocacy' in them in the >/popreporter/ directory. To do this, I created this record in MySQL: > www:/usr/local/aspseek/etc# mysql -u aspseek12 -p aspseek12 > mysql> select * from subsets; > +-----------+-------------------------------------+ > | subset_id | mask | > +-----------+-------------------------------------+ > | 2 | http://www.jhuccp.org/popreporter/% | > +-----------+-------------------------------------+ > > When I run index -B, I get: > www:/usr/local/aspseek/etc# su - -s /bin/bash aspseek > aspseek@www:~$ sbin/index -B > Loading configuration from /usr/local/aspseek/etc/db.conf > Loading configuration from /usr/local/aspseek/etc/ucharset.conf > Loading configuration from /usr/local/aspseek/etc/stopwords.conf > Loading configuration from /usr/local/aspseek/etc/aspseek.conf > Generating subset http://www.jhuccp.org/popreporter/% ... done (96 URLs) > index process finished. > aspseek@www:~$ > > This seems to indicate that I've got the subset set up correctly. > > Then, to test this, I manually edit the URL in the browser's location box to: > >http://www.jhuccp.org/cgi-bin/s.cgi?q=advocacy&cs=&ps=20&o=0&ul=http://www.jhuccp.org/popreporter/% > > I've tried variations on this, such as putting the URL in quotes, just using >'/popreporter/' etc. Still no joy. > > When I submit it, it returns the same 424 documents as before; no restriction to the >/popreporter/ directory is done. > > I've read in some of the posts to this list that the subset should be set up without >the '%', so I also tried that: > aspseek@www:~$ mysql -u aspseek12 -p aspseek12 > Enter password: > mysql> select * from subsets; > +-----------+------------------------------------+ > | subset_id | mask | > +-----------+------------------------------------+ > | 1 | http://www.jhuccp.org/popreporter/ | > +-----------+------------------------------------+ > 1 row in set (0.00 sec) > > Then I run: > aspseek@www:~$ sbin/index -a -m -u "http://www.jhuccp.org/popreporter/%" > Loading configuration from /usr/local/aspseek/etc/db.conf > Loading configuration from /usr/local/aspseek/etc/ucharset.conf > Loading configuration from /usr/local/aspseek/etc/stopwords.conf > Loading configuration from /usr/local/aspseek/etc/aspseek.conf > Adding URL: http://www.jhuccp.org/popreporter/current.shtml > Adding URL: http://www.jhuccp.org/popreporter/subscribe.shtml > Adding URL: http://www.jhuccp.org/popreporter/index.shtml > Adding URL: http://www.jhuccp.org/popreporter/2002/02-25.shtml > <snip> > Adding URL: http://www.jhuccp.org/popreporter/2001/06-11.shtml > Adding URL: http://www.jhuccp.org/popreporter/2001/06-04.shtml > Saving real-time database ... done. > Saving delta files [..................................................] done. > Deleting 'deleted' records from urlword[s] ... done. (0 records deleted) > Saving real-time ... done > Saving redirects ... done > Splitting href delta file ... done > Saving href delta files ... done > Saving direct href delta files ... done > Calculating ranks [................................................] done. > Saving lastmods ... done > Generating word site ... done > Generating subset http://www.jhuccp.org/popreporter/ ... done (0 URLs) > index process finished. > aspseek@www:~$ > > The dlog.log says, "Subset http://www.jhuccp.org/ not found". Yet, the index command >suggests that it found plenty. > > Could someone please set me straight on how this should work? Thank you very much >for your help. > > -Kevin Zembower > -- == kir_at_asplinux.ru == 7551596_at_ICQ == 6722750_at_sms.beemail.ru == Dream like you'll live forever...Love like you've never been hurt... Work like you don't need the money...and Dance like nobody is watching! -- Satchel Paige
