Kir, thank you for your reply.

In my installation, dlog.log is found in /usr/local/aspseek/var/, not 
/usr/local/aspseek/var/aspseek12/. The dlog.log file seems to only be updated when a 
search is done, not when the index is produced. When the index is produced, the file 
/usr/local/aspseek/var/aspseek12/logs.txt is updated with these lines:
New indexing session started at: 1042470091
Got next     96 URLs for:   0.011 seconds. Queued docs:    96.Time 0-1042470091.

This was for the command 'sbin/index -a -m -u "http://www.jhuccp.org/popreporter/%";'.

Every time a search is done, var/dlog.log is updated with this line:
Subset http://www.jhuccp.org/ not found

This is so even though the subset is 'http://www.jhuccp.org/popreporter/%' and not 
just 'http://www.jhuccp.org'. 

I've killed and restarted searchd a couple of times without any change to the results.

Thank you, again, for any help and suggestions.

-Kevin Zembower

>>> [EMAIL PROTECTED] 01/13/03 10:12AM >>>
Looks you have done everything right. Hmm...could you check searchd's log 
file /usr/local/aspseek/var/aspseek12/dlog.log for some "Subset not found"
messages? Also, as a last resort, try restarting searchd....

KEVIN ZEMBOWER wrote:
> I'm trying to restrict the found documents to one's in a particular directory. Our 
>aspseek search engine is at http://www.jhuccp.org/cgi-bin/s.cgi. 
> 
> If you enter a search term like 'advocacy', you should get a return of about 424 
>documents. To do this, aspseek uses this URL: 
> http://www.jhuccp.org/cgi-bin/s.cgi?q=advocacy&cs=&ps=20&o=0 
> 
> We want to limit the found documents to the ones that have 'advocacy' in them in the 
>/popreporter/ directory. To do this, I created this record in MySQL: 
> www:/usr/local/aspseek/etc# mysql -u aspseek12 -p aspseek12 
> mysql> select * from subsets; 
> +-----------+-------------------------------------+ 
> | subset_id | mask | 
> +-----------+-------------------------------------+ 
> | 2 | http://www.jhuccp.org/popreporter/% | 
> +-----------+-------------------------------------+ 
> 
> When I run index -B, I get: 
> www:/usr/local/aspseek/etc# su - -s /bin/bash aspseek 
> aspseek@www:~$ sbin/index -B 
> Loading configuration from /usr/local/aspseek/etc/db.conf 
> Loading configuration from /usr/local/aspseek/etc/ucharset.conf 
> Loading configuration from /usr/local/aspseek/etc/stopwords.conf 
> Loading configuration from /usr/local/aspseek/etc/aspseek.conf 
> Generating subset http://www.jhuccp.org/popreporter/% ... done (96 URLs) 
> index process finished. 
> aspseek@www:~$ 
> 
> This seems to indicate that I've got the subset set up correctly. 
> 
> Then, to test this, I manually edit the URL in the browser's location box to: 
> 
>http://www.jhuccp.org/cgi-bin/s.cgi?q=advocacy&cs=&ps=20&o=0&ul=http://www.jhuccp.org/popreporter/%
> 
> I've tried variations on this, such as putting the URL in quotes, just using 
>'/popreporter/' etc. Still no joy.
> 
> When I submit it, it returns the same 424 documents as before; no restriction to the 
>/popreporter/ directory is done. 
> 
> I've read in some of the posts to this list that the subset should be set up without 
>the '%', so I also tried that:
> aspseek@www:~$ mysql -u aspseek12 -p aspseek12
> Enter password: 
> mysql> select * from subsets;
> +-----------+------------------------------------+
> | subset_id | mask                               |
> +-----------+------------------------------------+
> |         1 | http://www.jhuccp.org/popreporter/ |
> +-----------+------------------------------------+
> 1 row in set (0.00 sec)
> 
> Then I run:
> aspseek@www:~$ sbin/index -a -m -u "http://www.jhuccp.org/popreporter/%";
> Loading configuration from /usr/local/aspseek/etc/db.conf
> Loading configuration from /usr/local/aspseek/etc/ucharset.conf
> Loading configuration from /usr/local/aspseek/etc/stopwords.conf
> Loading configuration from /usr/local/aspseek/etc/aspseek.conf
> Adding URL: http://www.jhuccp.org/popreporter/current.shtml 
> Adding URL: http://www.jhuccp.org/popreporter/subscribe.shtml 
> Adding URL: http://www.jhuccp.org/popreporter/index.shtml 
> Adding URL: http://www.jhuccp.org/popreporter/2002/02-25.shtml 
> <snip>
> Adding URL: http://www.jhuccp.org/popreporter/2001/06-11.shtml 
> Adding URL: http://www.jhuccp.org/popreporter/2001/06-04.shtml 
> Saving real-time database ... done.
> Saving delta files [..................................................] done.
> Deleting 'deleted' records from urlword[s] ... done. (0 records deleted)
> Saving real-time ... done
> Saving redirects ... done
> Splitting href delta file ... done
> Saving href delta files ... done
> Saving direct href delta files ... done
> Calculating ranks  [................................................] done.
> Saving lastmods ... done
> Generating word site ... done
> Generating subset http://www.jhuccp.org/popreporter/ ... done (0 URLs)
> index process finished.
> aspseek@www:~$ 
> 
> The dlog.log says, "Subset http://www.jhuccp.org/ not found". Yet, the index command 
>suggests that it found plenty.
> 
> Could someone please set me straight on how this should work? Thank you very much 
>for your help. 
> 
> -Kevin Zembower 
> 


-- 
== kir_at_asplinux.ru == 7551596_at_ICQ == 6722750_at_sms.beemail.ru ==

Dream like you'll live forever...Love like you've never been hurt...
Work like you don't need the money...and Dance like nobody is watching!
        -- Satchel Paige

Reply via email to