Hello, I have implemented a number of new features for the name finder. These include Brown clusters features (duplicated per Brown path for each feature activated involving a token) and Clark cluster features (similar to the WordClusterFeatureGenerator currently available) among other local extra features which interact well with the clustering ones.
I think it will be nice to include them before the new release. I will open issues about each of them. What do you think? In the meantime, I am in the process of testing these new features locally but I have run into a number of issues/questions about how to proceed about the extension of the TokenNameFinderFactory: 1. I add the new features to the GeneratorFactory. 2. I create a new feature descriptor accordingly with some of the new features. 3. I extend the TokenNameFinderFactory and I instantiate the subclass via the TokenNameFinderFactory.create(subclassName, featuregenerator[] bytes, resources, sequenceCodec) method. 4. I override the TokenNameFinderFactory.createFeatureGenerators() method in the extended class. 5. At this point, I do not have access to the featureGeneratorBytes[] because the TokenNameFinderFactory does not provide a getter. I add a getter accordingly in the TokenNameFinderFactory class. Should we do this? Or I am doing the extension of the TokenNameFactory in a wrong way? 6. *Some* of the new features work. If an Element name in the descriptor does not match in the GeneratorFactory, then the TokenNameFinderFactory.createFeatureGenerators() gives a null and the TokenNameFinderFactory.createContextGenerator() automatically stops the feature creation and goes for the NameFinderME.createFeatureGenerator(). Is this the desired behaviour? Perhaps we could add a log somewhere? To inform of the backoff to the default features if one descriptor element does not match? Any comment appreciated. Thanks, Rodrigo
