>>>>> "Keith" == Keith James <[EMAIL PROTECTED]> writes:

>>>>> "Phillip" == Phillip Lord <[EMAIL PROTECTED]> writes:

>>>>> "Matthew" == Matthew Pocock <[EMAIL PROTECTED]> writes:

  Matthew> SymbolList should be behaving like a string over its
  Matthew> symbols. It is silly if it doesn't do this. Hash codes
  Matthew> should realy be calculated in a different (but
  Matthew> sequence-dependant) way to avoid scanning the whole of very
  Matthew> large sequences just to do a hash lookup. Anyone got any
  Matthew> ideas?

  Phillip> Just make the hash out of say the first 10 elements in the
  Phillip> list. The hashcode is not meant to be unique for all
  Phillip> sequences, it's just a performance enhancement. So long as
  Phillip> equals returns false for different sequences, then there is
  Phillip> no problem.

  Keith> in a similar vein, the array sampling techniques at

  Keith> http://www273.pair.com/med/columns/Durable6.html

  Keith> would work, but equals would get called more often for
  Keith> sequences with similar base composition. How about first 10
  Keith> and then add in values for just the indices that are powers
  Keith> of two?

Probably be a good idea to factor in the length of the Alphabet as
well. If there are only a few symbols you get much more chance of a
collision because there are only unique values for the elements.

You will still get problems though if the sequence underneath changes,
while you are using it as a hash key.

Right, I really am going back to lurking now. 

Phil
_______________________________________________
Biojava-l mailing list  -  [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l

Reply via email to