Sequence coding

Hi all,

the chunker and name finder both use IOB2 sequence coding. The logic
to do that is hard coded in both components.

I would like to suggest that we introduce a SequenceCodec interface toabstract

this code and make it replaceable with different sequence codecs.

This will allow us to reuse the sequence codec in both components, andmake it

replaceable with other sequence codecs such as BILOU.

On my NER test datasets the F-Measure went up or down by around 1% depending

on the machine learner and data set with BILOU coding compared to IOB2coding.


I didn't do any testing in the chunker.

Any opinions? Is it worth the effort?

Jörn

Reply via email to