How do you plan to combat littering of the search results? Like the way that a
lot of spam pages will attach just about any popular keyword in the meta-data
on the web?
I see this as a potentially large problem, because the issue of what hits
actually have something to do with the terms is an inherently subjective human
decision. The best solution I know is google, which rates by the number of
references to the document among others matching the keywords. Since linking to
a page is often a way of approving on it, this turns out to work very well.
But I don't see any way we could do that. Possibly we could rate the search
results by some value for how common the data is, as a measure of it's
popularity. This could work if only a portion of the requests for data come as
the result of searches - otherwise it simply becomes a positive feedback as to
who had the most keywords and first. Also, a document may be very popular (say
a porn page) without deserving to be highly rated for the keyword "teletubbies".
I'm guessing you want to see some sort of voting, which might work, but I don't
know any search engines that work by having the users vote on the validity of
the hits, so it is hard to say exactly...
On Thu, 18 May 2000, Ian Clarke wrote:
> > It depends on how the search data is keyed. Yes, it will be separate from
> > the actual data, but we have to determine just how we route search
> > requests.
>
> Er, not only do we have a proposal for this, but I have already begun to
> implement it!
>
> The idea is that you have a search query like "contains("hello") and
> contains("goodbye") or matches("fred")". On recieving a request for
> data matching this key, each plain-text key in the datastore is compared
> against the search and given a score between 0 and 1, 1 being a perfect
> match (fuzzy logic is used to determine this score). If any data gets a
> 1 then it is returned immediately. If not, the best score on the node
> is checked against one or more "best so far" scores transmitted with the
> search. If one of our local keys beats these scores then it is added to
> this list in the appropriate place. The request is then forwarded in
> the usual manner to the node corresponding to the highest scoring key.
> When the request times out or finds a perfect match it passes back along
> the path, bringing with it the list of "best-so-far" keys.
>
> This way we get a yahoo or altavista-style search mechanism which will
> give us a nice list of closest/next closest keys.
>
> I have only begun to implement this but take a look at the
> Freenet.search package to get an idea of what I am talking about.
>
> Ian.
>
> _______________________________________________
> Freenet-dev mailing list
> Freenet-dev at lists.sourceforge.net
> http://lists.sourceforge.net/mailman/listinfo/freenet-dev
--
Oskar Sandberg
md98-osa at nada.kth.se
_______________________________________________
Freenet-dev mailing list
Freenet-dev at lists.sourceforge.net
http://lists.sourceforge.net/mailman/listinfo/freenet-dev