2009/6/10 Daniel Cheng <j16sdiz+freenet at gmail.com>: > On 10/6/2009 20:42, Mike Bush wrote: >> 2009/6/10 Evan Daniel<evanbd at gmail.com>: >>> On Wed, Jun 10, 2009 at 6:49 AM, Mike Bush<mpbush at gmail.com> ?wrote: >>>> XMLLibrarian doesn't currently support searching for phrases or rating >>>> relevance of results based on proximity so I don't think common words >>>> could be of any use in searches now. >>>> >>>> Also, I'm not sure but I think the current index doesn't include words >>>> under 4 letters at all. >>> If you read my previous mails, you'll see that the the spider is in >>> fact indexing the word "the". >>> >> >> Yes sorry, Ive since searched for 'who' on wanna and it is there, it >> gave me OutOfMemoryException trying to generate the results page >> > > You have get it :) > > This is yet another reason to split the <site> part out.
I've built 2 indexes to find the space saving from separating keys from words as well, for an index > 16000 keys with 256 subindices : The normal index with keys integrated in files >400MB With keys in a separate key index(3MB) it totals 160MB Of course the difference wouldn't be so large if the index wasn't separated into so many pieces. One thing I worried about was that the file index would get very large, but even for the key index to be bigger than one of wanna's subindexes it would contain > 320000 keys. How many keys do very large indexes have? MikeB > In which we may keep in memory the siteId only, not the whole uri, before the > union. > > Even so, I suspect searching words like "the who" will ever work without on > disk temp files. > >>> Evan Daniel >>> > _______________________________________________ > Devl mailing list > Devl at freenetproject.org > http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl >