Re: Namefinder Changes

Jörn Kottmann Wed, 05 Mar 2014 11:32:38 -0800

Have a look at the Sequence Coding thread here on the list.


The name finder always used IOB2 coding by default, we made this now
configurable and it can be replaced by other codecs such BILOU, or when
the work is done by a user implemented codec.

To detect names in a sentence the name finder uses a learn ableclassifier. The classifierhas to decide if a token is part of name or not. The logic on whichlabels are used to encode/

decode name spans is now the responsibility of the SequenceCodec object.

In the IOB2 codec (see the BioCodec class) the tokens are labels asBegin, Inside, Other.

Each new name span has to start with the Begin label.

The BILOU codec uses the following labels: Begin, Inside, Last, Unit andOther.

The might be advantages to switch the codec depending on the data youare using,in the German CONLL03 data the evaluation results are slightly betterwith BILOU

instead of IOB2.

The BILOU codec uses more labels, and will be more resource intensivecompared to IOB2.


Also have a look at the wikipedia article about IOB:
http://en.wikipedia.org/wiki/Inside_Outside_Beginning

HTH,
Jörn

On 03/05/2014 02:18 PM, Mark G wrote:

Hello, I updated the tools trunk two days ago and stopped getting NER
results. I chatted with Joern and he made a change to the seq codec that
brought everything back to normal. For the benefit of everyone on the dev
list, would it be possible for someone to explain the changes regarding the
sequence codec: its benefits, the differences, and where in the code to
look to see what it is actually doing. Don't need anything elaborate, just
a point of departure for inquiry.
MG

Re: Namefinder Changes

Reply via email to