Re: Merging the output of multiple name finders

Jörn Kottmann Tue, 17 Apr 2012 06:04:28 -0700

I propose that we make a simple baseline implementations
which takes all output spans, orders them and then resolves
the ambiguities based on the order. This will prefer longer
names over shorter names, but ignores the type.


There are more sophisticated ways of handling this,
e.g taking probabilities from the statistical name finders into
account, but these might be a bit more restrictive as well.

Its always good to have some simple baseline, to see how much
something more complicated improves it.

Any opinions?

Jörn

On 04/17/2012 02:52 PM, Jörn Kottmann wrote:

If you don't want to handle these cases, you can simply copy all namestogether
into a list, and then do evaluation on this list.
This approach works with our evaluation, but will usually be an issuefor applications which expect output
where the ambiguities mentioned earlier are resolved.

Jörn

On 04/17/2012 02:38 PM, Jim - FooBar(); wrote:
Ok first of all you're referring to the final merging(AggregateNameFinder) and not the multiple dictionaries where nomerging occurs...anyway let's deal with this at the moment. let's see...
- Two names can be identical and have the same type or a different type
Well if the type is different the spans are not identical (equal) soyou keep both and do some reasoning over them (see below).If they type is the same and the spans cover the same text then theyare equal so you only keep one of them.
- Two names have intersecting spans
It is very unlikely that both are correct so in the simplest case ofkeeping them both you may lose some precision. However consideringhow often that could happen it becomes unimportant. Or you could dosome reasoning (see below) again if they have the same type. If theydon't have the same type then why not keep them both again?
- One name is contained in another like this:
<START:A>  a b<START:B>  c<END:B>  d<END:A>
well, this is exactly the same case as before conceptually. If theyhave the same type it's very likely that one is wrong.You can do thesame sort of reasoning as above. If they don't there is no way toknow with confidence what to do so i say keep them both.
the reasoning i'm referring to is simply to *trust the dictionary*(if one exists). If one doesn't exist and one is trying to mergeresults from several maxent models for example, then we cannot makean informed decision. It is only the dictionary that can providefacts. all the rest are probabilities...
Jim
Hi all,

in one of the jiras we started a discussion about merging the output
of multiple name finders and which conflicts exist.
Lets move it back to the dev list.

The merging code needs to handle these cases:
- Two names can be identical and have the same type or a differenttype.
- Two names have intersecting spans like this:
<START:A>  a b<START:B>  c<END:A>  d<END:B>

- One name is contained in another like this:
<START:A>  a b<START:B>  c<END:B>  d<END:A>

Depending on the use case and merging logic it might be resolved
differently.

Jörn

Re: Merging the output of multiple name finders

Reply via email to