>>>>> "Keith" == Keith James <[EMAIL PROTECTED]> writes:
>>>>> "Phillip" == Phillip Lord <[EMAIL PROTECTED]> writes: >>>>> "Matthew" == Matthew Pocock <[EMAIL PROTECTED]> writes: Matthew> SymbolList should be behaving like a string over its Matthew> symbols. It is silly if it doesn't do this. Hash codes Matthew> should realy be calculated in a different (but Matthew> sequence-dependant) way to avoid scanning the whole of very Matthew> large sequences just to do a hash lookup. Anyone got any Matthew> ideas? Phillip> Just make the hash out of say the first 10 elements in the Phillip> list. The hashcode is not meant to be unique for all Phillip> sequences, it's just a performance enhancement. So long as Phillip> equals returns false for different sequences, then there is Phillip> no problem. Keith> in a similar vein, the array sampling techniques at Keith> http://www273.pair.com/med/columns/Durable6.html Keith> would work, but equals would get called more often for Keith> sequences with similar base composition. How about first 10 Keith> and then add in values for just the indices that are powers Keith> of two? Probably be a good idea to factor in the length of the Alphabet as well. If there are only a few symbols you get much more chance of a collision because there are only unique values for the elements. You will still get problems though if the sequence underneath changes, while you are using it as a hash key. Right, I really am going back to lurking now. Phil _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l
