thanks Joern, I'll take a closer look.
On Wed, Mar 5, 2014 at 2:30 PM, Jörn Kottmann <[email protected]> wrote: > Have a look at the Sequence Coding thread here on the list. > > The name finder always used IOB2 coding by default, we made this now > configurable and it can be replaced by other codecs such BILOU, or when > the work is done by a user implemented codec. > > To detect names in a sentence the name finder uses a learn able > classifier. The classifier > has to decide if a token is part of name or not. The logic on which labels > are used to encode/ > decode name spans is now the responsibility of the SequenceCodec object. > > In the IOB2 codec (see the BioCodec class) the tokens are labels as Begin, > Inside, Other. > Each new name span has to start with the Begin label. > > The BILOU codec uses the following labels: Begin, Inside, Last, Unit and > Other. > > The might be advantages to switch the codec depending on the data you are > using, > in the German CONLL03 data the evaluation results are slightly better with > BILOU > instead of IOB2. > > The BILOU codec uses more labels, and will be more resource intensive > compared to IOB2. > > Also have a look at the wikipedia article about IOB: > http://en.wikipedia.org/wiki/Inside_Outside_Beginning > > HTH, > Jörn > > > On 03/05/2014 02:18 PM, Mark G wrote: > >> Hello, I updated the tools trunk two days ago and stopped getting NER >> results. I chatted with Joern and he made a change to the seq codec that >> brought everything back to normal. For the benefit of everyone on the dev >> list, would it be possible for someone to explain the changes regarding >> the >> sequence codec: its benefits, the differences, and where in the code to >> look to see what it is actually doing. Don't need anything elaborate, just >> a point of departure for inquiry. >> MG >> >> >
