Hello, Background: I am developing a tool that uses OpenNLP. I have a model that extends BaseModel, and several AbstractModels. I allow the user (myself) to specify the TrainerType (GIS/QN) for each model by using a list of TrainingParameters.
Potential Bugs: 1) Whenever I use QNTrainer, I get an error (number of threads <1). I think the problem is that the parameters are initialized in the isValid() method instead of the init() method. This works for GIS because in the doTrain(DataIndexer) method, the number of threads is a local variable taken from the TrainingParameters not a field in GIS. This leads to another question. When it the isValid() method supposed to be called? I am surprised that the TrainerFactory does not call it. 2) The psf (public static final) String variables used by the TrainingParameters are all over the place. The variables THEADS_(PARAM/DEFAULT) are defined in both QNTrainer and TrainingParameters. It should be defined in one of the places. I am not sure that AbstractTrainer isn’t the best place to put THREADS_(P/D). It isn’t just the variables Threads_(P/D), All the Training psf String variables from TrainingParameters are duplicated in AbstractTrainer. 3) Should the Interface EventTrainer have a doTrainDataIndexer and a getDataIndexer method? This is important to me because I extended OnePassDataIndexer to pre-assign the outputs. I know the outputs aprori, and I want to quick combine the results of the multiple models. Since the getEventTrainer returns an EventTrainer instead of an AbstractEventTrainer, I cannot call doTrain(DataIndexer). I cannot use the doTrain(ObjectStream<Event>); it creates a new OnePassIndexer. I am not sure if these are bugs or specifically designed plans (I can work around if needed). If these are bugs, I am happy to supply a fix. Thank you, Daniel