See below for answers to your questions.

Status update: I've been running my patch in production for about 16 hours with no problems. I've restarted each Tomcat (3) once and had no problems, but also detected no errors, either on send or receive. I have some code that I used in dev to force an error on a specific combination of session attribute name and value. I'm going to put that in prod so that I can test how it behaves with a large volume of sessions and at least one error.


Mitch

On 09/21/2018 05:00 PM, Mark Thomas wrote:
On 21/09/18 18:02, Mitch Claborn wrote:
Please forgive me if this is the incorrect place or format for
discussing this. I'm new to trying to develop for Tomcat.

This is the right place. Welcome to the Tomcat community.

I'm developing a patch for DeltaManager and I'd like to discuss with you
developers if it could be considered for inclusion in the base code.
Please see details below and comment.

Will do. Please note that session replication is not an area I am
particularly familiar with so if some of my comments are a little
off-base I apologise.

Problem: When the "all sessions" message is sent from one node to
another, when the receiving node is first starting up, I often run into
various errors with one of the sessions and it fails to deserialize.
This causes all the remaining sessions in that chunk
(sendAllSessionsSize) to be lost by the receiver.

Oops.

The problem with the
sessions is totally an application problem, but until I can figure those
problems out and solve them I need a way to limit the impact of these
problems to just the one session that is in error. I could set
sendAllSessionsSize="1" but that would take a LONG time to transmit, and
we have many thousands of sessions at any given time.

That seems like a reasonable problem to try and solve.

Change details:

1. Update
    org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
    and
    org.apache.catalina.ha.session.DeltaSession.doReadObject(ObjectInput)
    to produce a more detailed error message when a session is in
    error.  New error message includes: the session index in the list of
    sessions, the session ID, the last field or attribute that was
    attempted to be read.

I'm not sure how useful the index will be but the other information
makes sense to me.

The index gives me an indication of how many sessions were discarded because of the error.


2. Introduce new XML attribute verifySerializedSessions for DeltaManager.

Why would a user not want to enable this feature? The performance hit of
the additional deserialization on send?

That is the only reason I can think of.


3. If verifySerializedSessions="true",
    org.apache.catalina.ha.session.DeltaManager.serializeSessions(Session[])
    will first serialize each session then immediately deserialize it.
    If all is good, send the session as usual.  If any errors are
    encountered, create and send a dummy session with a known session ID
    instead. (This keeps the session count, which has already been put
    in the output stream, correct for the receiving node.)

Ah. Is the issue that serialization works but deserialization does not?
That seems a little odd. Can you give an example of how this might go
wrong? I am trying to understand the root cause(s) of the problem to
determine if the proposed solution is appropriate. I thought
DeltaSession simply skipped over attributes that it could not deserialize.

DeltaSession does skip attributes that are not serializable. I've had three identifiable errors, none of which I could reproduce at will.

1. A session with a Vector<Long> that might have contained nulls. This should not be an issue, but I fixed my code to eliminate nulls in that Vector, since they should not be there anyway.

2. In some of my own objects where I do my own serialization with JSON, there were some fields that I don't serialize that were not marked transient that should have been. Some of those embedded objects were thus serialized by the native serialization and caused some problems. I fixed those.

3. In another of my objects that I serialize with JSON, the JSON string in the serialized session was obviously corrupted and was not a valid JSON hash. I went over the serialization code with a fine tooth come and it appears to be correct. That same code works hundreds of thousands of times a day without error.

Especially in the case of #3, I suspect that there might be a concurrency issue - a session being modified in one request while it is being serialized in another.

FYI, bordering on TMI: I just recently switched to DeltaManager from a custom session sharing solution where I was doing my own persistence to a database, with no in-memory storage. Concurrency was not an issue in that setup because each request received an independent copy of the session content. I could have had concurrency issues all along and not known it.



4. Update
    org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
    to discard any received session that has the known dummy session ID.

This certainly looks like a problem that needs solving. I don't see any
obvious issues with the approach taken but I would like a better
understand of the root causes of the deserialization failures as I am
wondering if there are alternative solutions that are worth considering.

Understood. My goal with this patch is a) limit the negative effects of a serialization/deserialization error, and b) give more information about those errors so that the application can be fixed.


Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to