Hello,

First af all, thanks for the great engine. I have been using udmsearch for
indexing about 2M sites in Slovenia (.si) and aspseek performs much better
at this scale. However, I have some questions about it.

1) Handling deleted documents: Are those never removed from database? There is
 also a problem with robots.txt with deleted=1 flag. It seems that indexer
does not check this file any more, unless I put deleted=0 by hand in db.
This is realy a problem if some site admin puts robots.txt after some huge
access to his site...

2) index -D runs quite slow. I have 1GB mem dual alpha for db and dual
PIII 512MB for indexer. mysql has 256MB of key_buffer. Loading ranks takes
2 minutes per urlwordsXX table. Saving citation is fast. After indexing
several 100k documents, index -D  takes about 2-3hours. Is this normal?
Also, indexing few pages triggers index -D at the end, which takes some time.
Is there a flag to only perform indexing without saving delta files, ranking
etc? Btw, what are the files in  <topdir>var/aspseek/NNw/ used for?


3) I am trying to port aspseek to linux alpha.
I am having some problems with "LONG" issues. Since I would like the db to be
compatible for 64bit and 32bit architecture, I have defined LONG and ULONG to
be int on alpha, and added TLONG and TULONG as true long for mysql access and
compressing. indexer works, but I still have some problems with searchd.
I will send a patch when it is ready.

4) Considering words with accents, there are plenty of pages where they
are written in 7bit  (ccaron like c, etc.). As for word foms with ispell,
would it be difficult to extend searching for words with accents to
include words without accents? For example, synonym for
"filip&ccaron;i&ccaron;" would be "filipcic"?


Best regards,

Andrej

-- 
_____________________________________________________________
   dr. Andrej Filipcic,        E-mail: [EMAIL PROTECTED]
   Department of Experimental High Energy Physics - F9
   Jozef Stefan Institute, Jamova 39, P.o.Box 3000
   SI-1001 Ljubljana, Slovenia
   Tel.: +386-1-477-3674    Fax: +386-1-425-7074
-------------------------------------------------------------



Reply via email to