Hi Peter,

Yes, Xerces' regular expression support is meant to be thread-safe. Can you
open a JIRA issue with your findings here [1]?

Thanks.

[1] http://issues.apache.org/jira/browse/XERCESJ

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [email protected]
E-mail: [email protected]

"Peter Geraghty" <[email protected]> wrote on 06/03/2009
01:05:30 PM:

> I have encountered a sporadic failure of regular expression pattern
> matching in an application using Xerces RegularExpression.  Of the first
> 300,000 messages processed by this application in a new installation one
> message incorrectly reported a match failure when it should have been a
> success, but on resubmitting the same message was correctly reported as
> valid.
>
> The symptoms appear to indicate a thread safety problem and although I
> had understood RegularExpression to be thread-safe, looking at the code
> it does seem to be wrong.
>
> The application in question is using 2.9.0 but looking at
> RegularExpression.java in 2.9.1 the algorithm appears the same and is
> described below.
>
> The thread safety algorithm depends on a separate Context object being
> allocated on the stack if the Context object  referenced by
> RegularExpression's instance variable is "inuse".  For example, line
> 1420.
>
>         synchronized (this.context) {
>             con = this.context.inuse ? new Context() : this.context;
>             con.reset(target, start, end, this.numberOfClosures);
>         }
>
> The "inuse" boolean is set to true by the reset method inside the
> synchronized code section above, however it is set to false in a
> non-synchronized section, prior to each return point, e.g., at line
> 1449.
>
>                 con.inuse = false;
>                 return true;
>
> The "inuse" boolean is not declared as volatile, and so I believe the
> absence of synchronization is wrong and makes this class NOT thread
> safe.
>
> E.g., it is vulnerable to what the JLS second edition called a
> "Prescient Store" optimisation taking place, which could explain the
> behaviour I am seeing - the inuse of this.context being set to false
> earlier than would be expected could lead to concurrent use of a Context
> object which is not thread safe.  Since all return points from methods
> like "matches" do set "inuse" to false, a "prescient store" optimisation
> to set it to false before actually performing the match is quite
> plausible.
>
> Although the term "prescient store" is not used in the JLS third edition
> I believe the semantics described there for non-volatile field access in
> non-synchronized code regions still allow this possibility of an
> optimisation re-ordering the clearing of the inuse flag so that it
> happens BEFORE the actual use of the Context object.
>
> I would welcome comment on whether you agree this is a bug and/or
> whether there are any other known thread-safety issues with
> RegularExpression.
>
> In terms of a solution, one possibility is to declare "inuse" as
> volatile, another is to use a synchronized "setInUse" method on the
> Context.
>
> A third possibility would be to dispense with the approach of re-using
> Context objects via an instance variable reference, and always allocate
> a Context on the stack.  I also note that if this was done, and if the
> "prepare" method was not invoked lazily on the first match but was
> invoked up front as part of setting the pattern, there would be no need
> for any kind of synchronization within the "matches" methods.  This
> could give the optimum for heavy concurrent use of a common pattern in
> highly-multithreaded environment, but of course has trade-offs in other
> regards.
>
> Thanks.
>
> PeteDisclaimer:
>
> The contents of this E-mail plus any attachment is intended for the
> use of the
> addressee only and is confidential, proprietary and may be
> privileged. It will not be
> binding upon Trace Group or any group company (Trace).  Opinions,
> conclusions,
> contractual obligations and other information in this message in so
> far as they relate to
> the official business of Trace must be specifically confirmed in
> writing by Trace. If you
> are not the intended recipient you must not copy this message or
> attachment, use or
> disclose the contents to any other person, but are requested to
> telephone or E-mail
> the sender and delete the message and any attachment from your system.
Trace
> takes all reasonable precautions to ensure that no virus or defect
> is transmitted via
> this e mail, however Trace accepts no responsibility for any virus
> or defect that might
> arise from opening this E-mail or attachments.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]

Reply via email to