Hello everyone,

I am a student at the Human-Centered Computing Research Group at Freie
University Berlin in Germany and im working on a projekt to semantically
enrich ideas and other short texts.

To evaluate how good people annotate concepts in texts with our software,
we want to compare the concepts to a gold standard of expected
annotations. While working on this I realized the following thing: If I
use Precision and Recall, the decision is always binary (wrong/right), but
there are times when multiple concepts are "kind of right" (see example
below). Could you recommend me approaches or algorithms that deal with
this?

What I'm trying to achieve in detail:

I have a sentence like this: "This pet food distribution center is open now."
I also have my annotations user generated annotations.
Now I want to consider my annotations as gold standard (GS) so that I can
compare the user generated annotations with that GS.

If I use Precision/Recall/F-Measure I run into many issues because the
concepts I would consider 'best' are

- http://dbpedia.org/resource/Pet_food
- http://dbpedia.org/resource/Food_distribution
- http://dbpedia.org/resource/Distribution_center

But that would mean that for example 'Pet' and 'Food' would be as wrong as
'Car' and 'Closet'.
I could annotate redundantly but that would incentivce the user to
overannotate.


Best regards
Maximilian Stauss
HCC | FU Berlin



_______________________________________________
DBpedia-discussion mailing list
DBpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to