Tom Loosemore wrote:
Thanks for the feedback !

Muddy boots is cool...

Thanks :)
TheyWorkForYou.com adds links to Hansard by matching Proper Names with
Wikipedia entries.
http://www.theyworkforyou.com/debates/?id=2007-11-21a.1190.1

The number false positives is acceptable and the wikipedia links are
miles better than the user-generated glossary with which the site was
launched. But it's still limited since it only parses for Capitalised
Phrases or ACRONYMS.

Shifting to term extraction seemed an obvious route, but as I think
Muddy Boots shows, term extraction tends to throw up unacceptably
large number of  'false positive' terms- these result in crappy random
links and are user experience poison.

However, you can minimise "false positive" terms by running the copy
through several different flavours of term extractor, and only using
terms thrown up by x or more of them (where x depends on your appetite
for false positives vs false negatives).

I like this idea as obviously the context for the story (i.e. the tags we use to define it) impacts the final link recommendations, it's one of the two weak points in the system at the moment (the other being the previously mentioned disambiguation issues), however it's nice to have a platform that we can start to test these kind of ideas out ...
So, why not throw the copy through several more term extractors then
only use the overlapping terms?

- The BBC has at least one *excellent* term extractor in house which
adds extra metadata like 'this term is a person/place/topic'... would
be a lovely API to offer, hint hint...
-
Seconded ! Anybody else have any other recommendations for term extraction services ?
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/[email protected]/

-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/[email protected]/

Reply via email to