On 04/17/2012 04:00 PM, Jim - FooBar(); wrote:
On 17/04/12 13:52, Jörn Kottmann wrote:
If you don't want to handle these cases, you can simply copy all names together
into a list, and then do evaluation on this list.
This approach works with our evaluation, but will usually be an issue for applications which expect output
where the ambiguities mentioned earlier are resolved.

That is exactly what my current AggregateNameFinder does...It just gets rids of duplicates...

I propose that we make a simple baseline implementations
which takes all output spans, orders them and then resolves
the ambiguities based on the order. This will prefer longer
names over shorter names, but ignores the type.

There are more sophisticated ways of handling this,
e.g taking probabilities from the statistical name finders into
account, but these might be a bit more restrictive as well.

I agree on the baseline implementation but i don't see why the spans need to be ordered and why ambiguities need resolving...the only true ambiguity that can occur is having the exact same span with a different type in which case we need to make a decision. Taking the probabilities from maxent is also a bit naive because you will not know which model to trust (maybe the weakest model gives you highest probs)...

You can have overlapping spans, which usually always indicate a classification mistake and cannot be handled nicely by applications which expect non-overlapping output as a single name finders produces.
Therefore it they should be resolved by the baseline.

Jörn

Reply via email to