At 08:23 PM 12/28/2004, Curt Arnold wrote:

On Dec 28, 2004, at 7:50 AM, Ceki G�lc� wrote:
The sequenceNumber+timestamp combination makes it easy to implement

equals() method for LoggingEvent.


The implementation might be easy, but appears problematic.  The default
implementation of equals and getHashCode would be preferred if all you
cared about was object identity.

Object identity is not enough. Chainsaw as well as other receivers may access a source of LoggingEvent multiple times. In that case, we need an efficient way to discard duplicate LoggingEvents.

It may be preferable to move the definition of "close enough for chainsaw" or "close enough for queries" out into the consuming classes. I'll have to dig into this later since I'm on the road and not really set up for this type of code analysis.


If log4j 1.2.8 logging events are deserialized, it looks like they may all have the same sequence number so chainsaw might want to additional
checks if the timestamps are the same and the sequenceNumber is the default value.


By overriding equals, you also are required to override hashCode (which you do) which ideally distributes the hash values uniformly over the value space of int. By my reading, it looks like the hash code is an XOR of the high 32 bits of the timestamp (despite the comment) and the sequence number. If I'm reading it right, that looks like all logging events with a given sequenceNumber within a fairly large timeframe (4 million seconds, 11 days or so) would have the same hash code. Even it it is fixed to use the lower 32-bits, the hash codes still would be fairly tightly clustered. We could write a better hashCode, I'm just pointing out one of the side effects of choosing to override equals.


The code currently says:

/**
* The hashcode is computed as XOR of the lower 32 bits of sequenceNumber and
* the higher 32 bits of timeStamp;
*/
public int hashCode() {
return (int)((timeStamp >> 32) ^ (sequenceNumber & 0xFFFFFFFF));
}


In what way does the comment not match the code?

The hashcode does not need to uniformly spread out the returned values over the range of integer. It just has to make a minimal effort to avoid that many values do not pile up on top of each other. Given that all hash tables algorithms compute the hash value over modulo 'n' where 'n' is a small prime number, for all practical purposes the following hashcode would have been amply sufficient.

 public int hashCode() {
    return (int)((sequenceNumber & 0xFFFFFFFF));
  }

We are doing a little better by XORing with the higher 32 bits of the timestamp. (2 to 32 milliseconds is about 50 days.) A slightly better implementation would have been


public int hashCode() { return (int)((timeStamp >> 20) ^ (sequenceNumber & 0xFFFFFFFF)); }

2^20 millis = 1048576 millis or about 17 minutes.


p.s. LoggingEvent has commented out methods xgetMDCCopy and XXgetMDCKeySet. Are these going to be deleted or uncommented?

They'll eventually be deleted.


-- Ceki G�lc�

  The complete log4j manual: http://www.qos.ch/log4j/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to