Is the SequenceValidator the only thing we need to change? If a corpus uses BILOU, the formatters need to convert it to IOB2?
2014-02-19 7:01 GMT-03:00 Jörn Kottmann <[email protected]>: > Hi all, > > the chunker and name finder both use IOB2 sequence coding. The logic > to do that is hard coded in both components. > > I would like to suggest that we introduce a SequenceCodec interface to > abstract > this code and make it replaceable with different sequence codecs. > This will allow us to reuse the sequence codec in both components, and > make it > replaceable with other sequence codecs such as BILOU. > > On my NER test datasets the F-Measure went up or down by around 1% > depending > on the machine learner and data set with BILOU coding compared to IOB2 > coding. > > I didn't do any testing in the chunker. > > Any opinions? Is it worth the effort? > > Jörn >
