ThreadContext set-to-list tweak

David Blevins Thu, 06 Jun 2019 17:36:44 -0700

Doychin, thank you for so many code improvements!  It's very excellent to have 
your eyes on the code and see so many good commits.


Consider this a very late PR review :)  I did one tweak I wanted to run by you 
to see what thoughts you had.  I switched this from a Set back to a List:

 - 
https://github.com/apache/tomee/commit/4273365ef2902191e6a1fe025daa8877cbefc193#diff-cebe7eb85461eb2f286e293fea91a020

I don't think this is the resting state of the code, probably we need to do one 
of two things.

The hesitation is effectively on executing code in hash order.  My experience 
is that it isn't random enough to prevent code from truly becoming 
order-independent.  Some of the worst bugs I've wasted hours or days on turned 
out to be code that people reported as working most the time and would 
"randomly" fail.  In the end they would often turn out to be due to hashes and 
code that was written to be order independent, but actually wasn't.

One of the worst was a "bug" in CXF where if two classes had the same 
@Path("/foo") at the top, one would "randomly" be chosen based on hash order 
because the classes were in a hashset.  In practice the order was incredibly 
stable for many restarts (jvm instances) in a row and the "right" service was 
selected.  I could easily get 4 restarts in a row and see no issues, then the 
fifth "bam" it would "break."

Overall it took a 50 to 100 restarts with a debugger attached and about 4 solid 
hours to find the issue.  I could only reproduce the issue 50% of that time, so 
2 of those hours were wasted.  In the 50% where I could reproduce the issue, 
getting nearer to the source of the problem was much harder because I wouldn't 
know if that time the issue was going to show up.  You never knew if the time 
spent stepping through code was going to be for nothing as many that jvm 
instance wouldn't have the issue.

In the end, I decided there were three bugs and the worst was the last one:

 1. The user had a bug because you can't put @Path("/foo") on one class and the 
exact @Path("/foo") on another class
 2. CXF had a bug because it did not tell the user you cannot do this and ran 
the incorrect code anyway
 3. CXF had another bug because it was written to execute code potentially 
critical code in a random order

In the end *I* had zero bugs but was the guy losing 4 hours.  The reason I 
decided #3 was the worst was because of this:

 3. If the execution order had been stable, there would not have been a "bug".  
Whatever behavior the user saw (good or bad) would have been 100% consistent 
and reliable.  It would have always worked or never worked.

     -if always: If the user liked the behavior, I'd have never been called.
     -if never: If the user didn't like the behavior, I would have been able to 
reproduce it every time and find the issue likely 3x faster.

Effectively the randomness became a both a bug-cost-applifier and 
bug-obfuscator.  Since hashcodes (and therefore the execution order) are stable 
for the duration of the vm and you don't restart that often, it can be weeks or 
months before you notice any issues.

The realization I came to is that code should never be executed in hash order, 
it should always be a stable order.  It will either work or not work on the 
first try.

From that perspective, the code in either set or list form is ultimately not 
stable and needs fixing.  You were smart to see the list as a lie, because it 
is fed from static initializers so ultimately dependent on class-loading order 
which isn't guaranteed.

What we should do is add a guarantee in the order.  The fix for CXF (which I 
never made unfortunately) would have been to first sort the list of JAX-RS 
classes by class name, then select one by the path.  So then if the code 
"worked" and you pushed it into production it would always work.  It might 
break again in the future, but only due to code change.

We should probably do something like that in this code given how critical it 
is.  If order is unimportant, then the outside code (the 
ThreadContextListeners) probably don't mind if the ThreadContext sorts the list 
using some ordering of it's choosing so it will always at least be stable.

So on ThreadContext I changed the set back to a list, but did not implement 
something that would re-sort the list each time a listener is added or removed. 
 I wanted to first share the above and see if this is something you might want 
to hack on together.

Let me know if you do, because I think it would be incredibly fun to hack on 
one of the most important classes in the entire codebase with you. :)

On the sorting, we could probably use classname to determine order.  The trick 
would be the thread-safety.  We would need to rethink how we do that.  Sorting 
the list while another thread might be modifying it would just lead to an 
unstable sort.  Maybe an AtomicReference or something.  Dunno.

Anyway, huge email.  Sorry for the book :)  Love to hear your thoughts! 


-- 
David Blevins
http://twitter.com/dblevins
http://www.tomitribe.com

ThreadContext set-to-list tweak

Reply via email to