new tool training

Russ, Daniel (NIH/CIT) [E] Thu, 27 Oct 2016 07:44:40 -0700

Hello,

Background:
   I am developing a tool that uses OpenNLP.  I have a model that extends 
BaseModel, and several AbstractModels.  I allow the user (myself) to specify 
the TrainerType (GIS/QN) for each model by using a list of TrainingParameters.


Potential Bugs:

1)    Whenever I use QNTrainer, I get an error (number of threads <1).  I think 
the problem is that the parameters are initialized in the isValid() method 
instead of the init() method.  This works for GIS because in the 
doTrain(DataIndexer) method, the number of threads is a local variable taken 
from the TrainingParameters not a field in GIS.  This leads to another 
question. When it the isValid() method supposed to be called?  I am surprised 
that the TrainerFactory does not call it.


2)    The psf (public static final) String variables used by the 
TrainingParameters are all over the place.  The variables 
THEADS_(PARAM/DEFAULT) are defined in both QNTrainer and TrainingParameters.  
It should be defined in one of the places. I am not sure that AbstractTrainer 
isn’t the best place to put THREADS_(P/D).  It isn’t just the variables 
Threads_(P/D), All the Training psf String variables from TrainingParameters 
are duplicated in AbstractTrainer.


3)    Should the Interface EventTrainer have a doTrainDataIndexer and a 
getDataIndexer method?  This is important to me because I extended 
OnePassDataIndexer to pre-assign the outputs.  I know the outputs aprori, and I 
want to quick combine the results of the multiple models.  Since the 
getEventTrainer returns an EventTrainer instead of an AbstractEventTrainer, I 
cannot call doTrain(DataIndexer).  I cannot use the 
doTrain(ObjectStream<Event>); it creates a new OnePassIndexer.


I am not sure if these are bugs or specifically designed plans (I can work 
around if needed).  If these are bugs, I am happy to supply a fix.
Thank you,
Daniel

new tool training

Reply via email to