Hi Peter, Yes, Xerces' regular expression support is meant to be thread-safe. Can you open a JIRA issue with your findings here [1]?
Thanks. [1] http://issues.apache.org/jira/browse/XERCESJ Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: [email protected] E-mail: [email protected] "Peter Geraghty" <[email protected]> wrote on 06/03/2009 01:05:30 PM: > I have encountered a sporadic failure of regular expression pattern > matching in an application using Xerces RegularExpression. Of the first > 300,000 messages processed by this application in a new installation one > message incorrectly reported a match failure when it should have been a > success, but on resubmitting the same message was correctly reported as > valid. > > The symptoms appear to indicate a thread safety problem and although I > had understood RegularExpression to be thread-safe, looking at the code > it does seem to be wrong. > > The application in question is using 2.9.0 but looking at > RegularExpression.java in 2.9.1 the algorithm appears the same and is > described below. > > The thread safety algorithm depends on a separate Context object being > allocated on the stack if the Context object referenced by > RegularExpression's instance variable is "inuse". For example, line > 1420. > > synchronized (this.context) { > con = this.context.inuse ? new Context() : this.context; > con.reset(target, start, end, this.numberOfClosures); > } > > The "inuse" boolean is set to true by the reset method inside the > synchronized code section above, however it is set to false in a > non-synchronized section, prior to each return point, e.g., at line > 1449. > > con.inuse = false; > return true; > > The "inuse" boolean is not declared as volatile, and so I believe the > absence of synchronization is wrong and makes this class NOT thread > safe. > > E.g., it is vulnerable to what the JLS second edition called a > "Prescient Store" optimisation taking place, which could explain the > behaviour I am seeing - the inuse of this.context being set to false > earlier than would be expected could lead to concurrent use of a Context > object which is not thread safe. Since all return points from methods > like "matches" do set "inuse" to false, a "prescient store" optimisation > to set it to false before actually performing the match is quite > plausible. > > Although the term "prescient store" is not used in the JLS third edition > I believe the semantics described there for non-volatile field access in > non-synchronized code regions still allow this possibility of an > optimisation re-ordering the clearing of the inuse flag so that it > happens BEFORE the actual use of the Context object. > > I would welcome comment on whether you agree this is a bug and/or > whether there are any other known thread-safety issues with > RegularExpression. > > In terms of a solution, one possibility is to declare "inuse" as > volatile, another is to use a synchronized "setInUse" method on the > Context. > > A third possibility would be to dispense with the approach of re-using > Context objects via an instance variable reference, and always allocate > a Context on the stack. I also note that if this was done, and if the > "prepare" method was not invoked lazily on the first match but was > invoked up front as part of setting the pattern, there would be no need > for any kind of synchronization within the "matches" methods. This > could give the optimum for heavy concurrent use of a common pattern in > highly-multithreaded environment, but of course has trade-offs in other > regards. > > Thanks. > > PeteDisclaimer: > > The contents of this E-mail plus any attachment is intended for the > use of the > addressee only and is confidential, proprietary and may be > privileged. It will not be > binding upon Trace Group or any group company (Trace). Opinions, > conclusions, > contractual obligations and other information in this message in so > far as they relate to > the official business of Trace must be specifically confirmed in > writing by Trace. If you > are not the intended recipient you must not copy this message or > attachment, use or > disclose the contents to any other person, but are requested to > telephone or E-mail > the sender and delete the message and any attachment from your system. Trace > takes all reasonable precautions to ensure that no virus or defect > is transmitted via > this e mail, however Trace accepts no responsibility for any virus > or defect that might > arise from opening this E-mail or attachments. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected]
