To answer your original question though, the string's hash function *has* to be based on the string value, not the object identity. (So interning or lack thereof will not affect you.) This is because the String class overrides Equals. Any class that overrides Equals is *required* to override GetHashCode too and to make the two consistent. (I.e. if Equals says two objects are the same, they must have the same hash values.)
However, just because two strings have the same hash code doesn't mean they are the same - GetHashCode *is* allowed to return false positives. (It's just false negatives that aren't allowed.) In fact here is a guaranteed valid-for-every-occasion implementation of GetHashCode(): public override int GetHashCode() { return 1; } This doesn't break any rules. Of course it won't yield great performance if you use this on objects which are acting as keys in a HashTable... And it'll get you rather a lot of false positives for comparison. :-) But *any* hash has some chance of collision, unless the hash actually has at least as many bits as the data being hashed. The larger the hash, the lower the probability of collision will be (given a suitably good hash algorithm of course...) but the probability never drops to zero. MD5 is believed to be a good hash algorithm, and it has a length of 128 bytes (i.e. 1024 bits) IIRC, which gives a pretty low chance of a collision. But the question I would ask is are you actually wasting more space by storing MD5 hashes than you would be by just remembering the strings in the first place? Unless your strings are on average longer than 128 bytes (the size of an MD5 hash) then it will actually be rather more expensive to do what you're doing than just to store all the strings... Or is there some reason you can't store the strings? -- Ian Griffiths DevelopMentor ----- Original Message ----- From: "Erick Thompson" <[EMAIL PROTECTED]> > Upon further reflection, I think that using GetHashCode isn't a good idea to > check if I've seen a string before. I think I'll use a MD5 sig instead, so I > don't have to worry about collisions. > > Erick > > ----- Original Message ----- > From: "Erick Thompson" <[EMAIL PROTECTED]> > > > I need to check if a particular string has been seen before, so I > > was going to maintain a list of hashcodes from the strings. > > However, I want to confirm that the GetHashCode method > > returns a hashcode based on the string contents, not on the > > string object. The reason I'm not completely sure is that I'm > > not totally confident I know how string interning will effect things. > > Can I use GetHashCode to identify a string? You can read messages from the DOTNET archive, unsubscribe from DOTNET, or subscribe to other DevelopMentor lists at http://discuss.develop.com.