Peter Veentjer wrote:
Hi Ted,
one of the easy to find problems is spinning on a non volatile
variable without changing the value in the loop and without additional
synchronization (at least I didn't find it in a few seconds).
examples:
AgentControllerSocketListener.closing
HttpConnector.stopMe
IndexUpdateReducer.closed
SecundaryNodeName.shouldRun
voTask.hasNext
These can all be fixed by making the field volatile.
These are the easy ones that can be found with static analysis tools,
but I bet there are a lot of more harder to find ones.
One of the problems with concurrency issues is that they are hard to
test for -it's hard to create tests to show that the problem exists.
Another is that the main services -namenode, secondary namenode, etc,
all run (in production) in their own processes, so can get away with
concurrency risks and static shared code that aren't so appealing in
shared processes.
I think the Hadoop project would benefit from a structural approach to
solving these problems instead of just fixing these bugs. That is what
I want to help with but I can't do it without support of the
leading-developers of the Hadoop community.
1. I don't see anyone being against this, though you would have to start
with education. For example, it took me a bit to work out that you were
using JMM as an acronym for Java Memory Model.
2. I think we'd need to prioritise where the biggest risks are.
One of the things we need to agree upon is for example:
making fields that only are set in the constructor, final. This makes
analysis a lot easier.
It does, but it also makes subclassing trickier as subclassed instances
don't get a look in or an opportunity to override the values -even if
they have methods you can use to evaluate the subclassed values, the
fact that these are called from the parent's constructor makes them a
risk all on their own