On Thu, 2016-10-27 at 15:49 +0000, Russ, Daniel (NIH/CIT) [E] wrote: > > Comment 2: > Do you have a preference where the variable should go? I think > AbstractTrainer is the appropriate place for PSF variable dealing > with ALL trainers, so Threads_(P/D) should be there. I would remove > and refactor out of TrainingParams.
TrainingParameters is the class which is parsing the passed in params file. There is has to know about "Algorithm" all the others are specific to the trainer implementation. I think AbstractTrainer is probably a good place for PSF variables which deal with many/most trainers. > Comment 3: > Right I want to change the dataindexer. > > So I have multiple models that classify data (Job descriptions) into > Occupational Codes. I know what the codes are aprori, and even if > they are not in the training data, I need to make sure that there is > SOME probability for the codes. More importantly for each job > description, I need to compare the probabilities returned for each > output. By forcing the output indices to have the same values, I can > quickly compare them without re-mapping the output. > > I tried to extend OnePassDataIndex, but the indexing occurs during > object construction, so I cannot set the known outputs before > indexing occurs. > > Of course I would not need the getDataIndexer() method, but it is > defined in the Abstract class, why not in the Interface The thing is that with the current interface we can support implementations which don't use the Data Indexer. This can be the case when it relies on external machine learning libraries. Since 1.6.0 we have plugable ml support. I looked closer now, the getDataIndexer is a factory method for the Data Indexer. Maybe it would make sense to allow to specify a custom class for data indexing as part of the training parameters? Then the trainer who use the Data Indexer can just support that mechanism. Jörn