XMLLibrarian doesn't currently support searching for phrases or rating
relevance of results based on proximity so I don't think common words
could be of any use in searches now.

Also, I'm not sure but I think the current index doesn't include words
under 4 letters at all.



2009/6/10 Matthew Toseland <[email protected]>:
> On Wednesday 10 June 2009 06:54:03 Daniel Cheng wrote:
>> On Wed, Jun 10, 2009 at 12:02 PM, Evan Daniel<[email protected]> wrote:
>> > On my (incomplete) spider index, the index file for the word "the" (it
>> > indexes no other words) is 17MB.  This seems rather large.  It might
>> > make sense to have the spider not even bother creating an index on a
>> > handful of very common words (the, be, to, of, and, a, in, I, etc).
>> > Of course, this presents the occasional difficulty:
>> > http://bash.org/?514353  I think I'm in favor of not indexing common
>> > words even so.
>>
>> Yes, it should ignore common words.
>> This is called "stopword" in search engine termology.
>
> How do you propose to implement a search for "doctor who" if "who" is a 
> stopword?
>
> _______________________________________________
> Devl mailing list
> [email protected]
> http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
>
_______________________________________________
Devl mailing list
[email protected]
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to