On Thu, 19 Apr 2001, Sam Joseph wrote:

> 
> Well, that's kind of what I'm working on with NeuroGrid now.  It's not
> set up yet, but my approach is to get a person's bookmark file, extract
> all of the urls out of it, download each of those pages, chew them up,
> spit out all the tags, and then use some basic information retrieval
> statistics (like TFIDF - term frequency inverse document frequency) to
> work out which subset of keywords are relevant and use those as the
> basis for a user's NeuroGrid profile.
> 
> One could go so far as to try and create ranks based on the TFIDF and
> then translate them into usage ranks, like the ones I described, but I
> think they are just a very different kind of thing, and the idea with NG
> is that user's should be able to edit all the associations between
> keywords and their bookmarks, it should all be personalised.  So I would
> imagine using the bookmark file as a way to get some urls into the
> system, a little TFIDF to provide base associations and then let the
> searching do its work.  NG searching allows urls to become associated
> with other keywords through multiple keyword searches and so on, so I'm
> kind of putting my trust in that, rather than some information
> theoretical scheme that allegedly works out the *best* representation
> for the data.

So, use a synthetic summerizer like TFIDF and combine it with the NG
ranking, factoring it out as the confidence level rises?

I guess the trick is coming up with a synthetic summerizer that can come
close to matching NG concept associations.  The trouble with this is that
the relevancy of concept associations may change and new concepts may be
added as data is introduced.

I think ranking should be calculated differently based on the search
type anyway.  TFIDF and other summerizing methods are good at generalizing
a document, but are bad at finding specific information.

> 
> I think that data should be represented in a way that reflects the way
> it gets used.
> 
> CHEERS> SAM
> 
> p.s. any tips on how I can get my mails to follow the threading in these
> lists.  I beginning to think my only option is to re-subscribe and
> receive individual messages.
> 

BTW, doesn't gaijin mean foreigner? :)


_______________________________________________
Devl mailing list
[EMAIL PROTECTED]
http://lists.freenetproject.org/mailman/listinfo/devl

Reply via email to