I have encountered a sporadic failure of regular expression pattern
matching in an application using Xerces RegularExpression.  Of the first
300,000 messages processed by this application in a new installation one
message incorrectly reported a match failure when it should have been a
success, but on resubmitting the same message was correctly reported as
valid.  

The symptoms appear to indicate a thread safety problem and although I
had understood RegularExpression to be thread-safe, looking at the code
it does seem to be wrong.

The application in question is using 2.9.0 but looking at
RegularExpression.java in 2.9.1 the algorithm appears the same and is
described below.

The thread safety algorithm depends on a separate Context object being
allocated on the stack if the Context object  referenced by
RegularExpression's instance variable is "inuse".  For example, line
1420.

        synchronized (this.context) {
            con = this.context.inuse ? new Context() : this.context;
            con.reset(target, start, end, this.numberOfClosures);
        }

The "inuse" boolean is set to true by the reset method inside the
synchronized code section above, however it is set to false in a
non-synchronized section, prior to each return point, e.g., at line
1449.

                con.inuse = false;
                return true;

The "inuse" boolean is not declared as volatile, and so I believe the
absence of synchronization is wrong and makes this class NOT thread
safe.

E.g., it is vulnerable to what the JLS second edition called a
"Prescient Store" optimisation taking place, which could explain the
behaviour I am seeing - the inuse of this.context being set to false
earlier than would be expected could lead to concurrent use of a Context
object which is not thread safe.  Since all return points from methods
like "matches" do set "inuse" to false, a "prescient store" optimisation
to set it to false before actually performing the match is quite
plausible.

Although the term "prescient store" is not used in the JLS third edition
I believe the semantics described there for non-volatile field access in
non-synchronized code regions still allow this possibility of an
optimisation re-ordering the clearing of the inuse flag so that it
happens BEFORE the actual use of the Context object.

I would welcome comment on whether you agree this is a bug and/or
whether there are any other known thread-safety issues with
RegularExpression.

In terms of a solution, one possibility is to declare "inuse" as
volatile, another is to use a synchronized "setInUse" method on the
Context.

A third possibility would be to dispense with the approach of re-using
Context objects via an instance variable reference, and always allocate
a Context on the stack.  I also note that if this was done, and if the
"prepare" method was not invoked lazily on the first match but was
invoked up front as part of setting the pattern, there would be no need
for any kind of synchronization within the "matches" methods.  This
could give the optimum for heavy concurrent use of a common pattern in
highly-multithreaded environment, but of course has trade-offs in other
regards.

Thanks.

PeteDisclaimer: 
 
The contents of this E-mail plus any attachment is intended for the use of the 
addressee only and is confidential, proprietary and may be privileged. It will 
not be 
binding upon Trace Group or any group company (Trace).  Opinions, conclusions, 
contractual obligations and other information in this message in so far as they 
relate to 
the official business of Trace must be specifically confirmed in writing by 
Trace. If you 
are not the intended recipient you must not copy this message or attachment, 
use or 
disclose the contents to any other person, but are requested to telephone or 
E-mail 
the sender and delete the message and any attachment from your system. Trace 
takes all reasonable precautions to ensure that no virus or defect is 
transmitted via 
this e mail, however Trace accepts no responsibility for any virus or defect 
that might 
arise from opening this E-mail or attachments.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to