- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: luca pellegrini
Subject: Re: questions and suggestions on dpsearch

Hi Maxime,
as i go on using DataParkSearch i have more and more questions:
1) what is exactly IndexDocSizeLimit? i know by documentation that it 
corrensponds to the amount of data stored in index per document. what do you 
mean by "amount of data"? Is it the same as MaxDocSize?
2)i'm trying to index 20 .it domains and the indexer seems to index (using 
crc-multi) 10 GB of data. I think this is very much; this is dued to the fact 
that each word found in each web-page is being saved in the database. Do you 
think that we can save some space using cache-mode indexing (or any aother 
indexing tecnique)? Is there a way to have a sort of "lookup table" containing 
only the dictionary of indexed words?
3)is there a way to tell the indexer to avoid indexing stopwords?
4)is there a way to tell the indexer to avoid indexing a document if a certain 
string is being found in the document body? (for example: if it's a blog page, 
don't index that page). i think that NoIndexIf could be used for this purpuse, 
but how?
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1162208106

Reply via email to