Is the SequenceValidator the only thing we need to change? If a corpus uses
BILOU, the formatters need to convert it to IOB2?



2014-02-19 7:01 GMT-03:00 Jörn Kottmann <[email protected]>:

> Hi all,
>
> the chunker and name finder both use IOB2 sequence coding. The logic
> to do that is hard coded in both components.
>
> I would like to suggest that we introduce a SequenceCodec interface to
> abstract
> this code and make it replaceable with different sequence codecs.
> This will allow us to reuse the sequence codec in both components, and
> make it
> replaceable with other sequence codecs such as BILOU.
>
> On my NER test datasets the F-Measure went up or down by around 1%
> depending
> on the machine learner and data set with BILOU coding compared to IOB2
> coding.
>
> I didn't do any testing in the chunker.
>
> Any opinions? Is it worth the effort?
>
> Jörn
>

Reply via email to