On 04/17/2012 04:00 PM, Jim - FooBar(); wrote:
On 17/04/12 13:52, Jörn Kottmann wrote:
If you don't want to handle these cases, you can simply copy all
names together
into a list, and then do evaluation on this list.
This approach works with our evaluation, but will usually be an issue
for applications which expect output
where the ambiguities mentioned earlier are resolved.
That is exactly what my current AggregateNameFinder does...It just
gets rids of duplicates...
I propose that we make a simple baseline implementations
which takes all output spans, orders them and then resolves
the ambiguities based on the order. This will prefer longer
names over shorter names, but ignores the type.
There are more sophisticated ways of handling this,
e.g taking probabilities from the statistical name finders into
account, but these might be a bit more restrictive as well.
I agree on the baseline implementation but i don't see why the spans
need to be ordered and why ambiguities need resolving...the only true
ambiguity that can occur is having the exact same span with a
different type in which case we need to make a decision. Taking the
probabilities from maxent is also a bit naive because you will not
know which model to trust (maybe the weakest model gives you highest
probs)...
You can have overlapping spans, which usually always indicate a
classification mistake and cannot be handled nicely
by applications which expect non-overlapping output as a single name
finders produces.
Therefore it they should be resolved by the baseline.
Jörn