On Thu, Dec 13, 2001 at 10:33:52PM +1300, Mark Schreiber wrote: > Hi - > > When adding a large number of counts to a Distribution via a trainer i > have found it is much quicker to store the counts in and array (indexed by > the AlphabetIndex for that alphabet). Increment the counts as each symbol > comes in and then add the counts to the trainer at the end. (followed by > the .train() method). > > I'm curious as to why this is. I assume its cause the trainer checks the > validity of each symbol, although technically so does the AlphabetIndex by > looking up the index for the symbol. > > Not that this is a major issue it might just be a way to speed up > distribution training
Do you know what what implementation of Distribution you're using? SimpleDistribution uses a fairly sensible DistributionTrainer object (which uses an Indexed and an array -- pretty much the same as you are). However, I notice that there's also something called SimpleDistributionTrainer. This is storing counts in a Map<Symbol, Double>, and I suspect is likely to be /much/ less efficient -- especially as there's object churn every time a new count is added. If the distribution you're using is still using a SimpleDistributuinTrainer, I'd guess that could cause some fairly dreadful performance. Thomas. _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l