Re: Status of == vs equals() RESULTS
I agree with your decision based on test. It will be risky and will have littele gain to use == for string comparison. Eric On Tue, Aug 24, 2010 at 2:11 PM, Chad La Joie laj...@itumi.biz wrote: Okay, I'll prepare a patch for you by the end of the week. On 8/24/10 2:23 PM, Colm O hEigeartaigh wrote: Sounds fine to me. Colm. On Mon, Aug 23, 2010 at 8:55 PM, Chad La Joielaj...@itumi.biz wrote: Okay, getting back to this. I tried my tests again this time with: - a 7.5MB SAML metadata document (so lots of comparisons) - 100 warm up runs then 100 timed runs - an explicit GC between each run to keep it from happening during the runs since the DOMs were so large No real difference in results. equals() was faster. So, at this point, I can't see any reason to do anything other than equals(). It's the actual correct way of doing the comparison in that it will always return the proper result and the JVM definitely seems to be optimizing its use. On 8/10/10 7:53 AM, Chad La Joie wrote: Okay, I certainly have a number of SAML documents lying around so I'll try with those as well. And, of course, I'll report back the results I get. On 8/10/10 4:46 AM, Raul Benito wrote: As the original author of the changes of equals to == in intern namespaces, I can tell that original in 1.4 and 1.5 and with my data (that was the verification of a SAML/Liberty AuthnReq in a multi thread tests, and the old Juice JCE provider). The change was 10% to 20% faster. The SAML is one of the real example of signing and has some url with common prefixes and same length url. The Juice provider also helps to get rid of the signing/digest cost (a verification is two c14n one of the signing part and c14n of the signature), but i think just a c14n is a good way of measure it. Also take into account that the == vs equals debate is more a memory workload cache problem, if we have to iterate over and over every char just to see if it is not equals, we trash the cache (That's why i used the multi thread to simulate a server decoding requests with more or less the same code, but in different times and different workload) Nevertheless if you have test with a more modern jre and the code .equals is behaving better, just go ahead and kiss goodbye to the ==. Clive, using the .hashCode for strings in this case is not a big speed-up as it is going to go through all the chars of the string, trashing cache again, and multiplying and adding the result to an integer, instead of a fail in the first different char or just summarize to a boolean.\ Regards, On Tue, Aug 10, 2010 at 2:37 AM, Clive Brettingham-Moore xml...@brettingham-moore.netmailto:xml...@brettingham-moore.net wrote: Have to agree .equals is the way to go, since correctness of == is too reliant on what must be considered implementation optimisations in the parser. Benchmarking in JVM is notoriously difficult, but it does look like there is no gross difference, which should kill any objections to doing it correctly. Since I recently spend far to long researching this for an unrelated problem I'll add my 10c to the detail discussion. On 10/08/10 01:23, Chad La Joie wrote: Not necessarily, there are a number of not equal checks in there that should, in theory, perform better if you only use == only. In such a case, the use of != will just be a single check while !equals() will result in a char-by-char comparison. Actually, the next thing String.equals tests is length equality - so character comparison will only be reached if the strings are the same length. Since the char by char comparison returns on the first mismatch, then only same length strings with shared prefixes will show the expected slowness. (namespace URIs are likely to share prefixes, but I think are not particularly likely to be the same length, unless actually equal)... thus String.equals is only likely to be slow where comparing long distinct but equal strings (so intern or alternative string pooling techniques needed for == benefit .equals without all the nasty loopholes: even if .equals is occasionally slow, at least it is always right). In circumstances where doing repeated tests with many length and prefix matches, adding a hash code inequality test ((s1.hashCode()== s2.hashCode())s1.equals(s2)) could prevent practically all char-by-char checks for !equal cases (but if the same strings are never repeatedly used, the hash code calculation could be an issue; nb intern results in hash calculation for all strings anyway)... pooling is still needed to speed up matches for equality though. Re VM options I would feel -server is definitely the right test bed, both because of the more aggressive JIT, and also because the code is likely to see heaviest real world cases in -server VMs. -- Chad La Joie http://itumi.biz trusted identities, delivered -- Chad La Joie http://itumi.biz trusted identities, delivered
Re: Status of == vs equals() RESULTS
Sounds fine to me. Colm. On Mon, Aug 23, 2010 at 8:55 PM, Chad La Joie laj...@itumi.biz wrote: Okay, getting back to this. I tried my tests again this time with: - a 7.5MB SAML metadata document (so lots of comparisons) - 100 warm up runs then 100 timed runs - an explicit GC between each run to keep it from happening during the runs since the DOMs were so large No real difference in results. equals() was faster. So, at this point, I can't see any reason to do anything other than equals(). It's the actual correct way of doing the comparison in that it will always return the proper result and the JVM definitely seems to be optimizing its use. On 8/10/10 7:53 AM, Chad La Joie wrote: Okay, I certainly have a number of SAML documents lying around so I'll try with those as well. And, of course, I'll report back the results I get. On 8/10/10 4:46 AM, Raul Benito wrote: As the original author of the changes of equals to == in intern namespaces, I can tell that original in 1.4 and 1.5 and with my data (that was the verification of a SAML/Liberty AuthnReq in a multi thread tests, and the old Juice JCE provider). The change was 10% to 20% faster. The SAML is one of the real example of signing and has some url with common prefixes and same length url. The Juice provider also helps to get rid of the signing/digest cost (a verification is two c14n one of the signing part and c14n of the signature), but i think just a c14n is a good way of measure it. Also take into account that the == vs equals debate is more a memory workload cache problem, if we have to iterate over and over every char just to see if it is not equals, we trash the cache (That's why i used the multi thread to simulate a server decoding requests with more or less the same code, but in different times and different workload) Nevertheless if you have test with a more modern jre and the code .equals is behaving better, just go ahead and kiss goodbye to the ==. Clive, using the .hashCode for strings in this case is not a big speed-up as it is going to go through all the chars of the string, trashing cache again, and multiplying and adding the result to an integer, instead of a fail in the first different char or just summarize to a boolean.\ Regards, On Tue, Aug 10, 2010 at 2:37 AM, Clive Brettingham-Moore xml...@brettingham-moore.net mailto:xml...@brettingham-moore.net wrote: Have to agree .equals is the way to go, since correctness of == is too reliant on what must be considered implementation optimisations in the parser. Benchmarking in JVM is notoriously difficult, but it does look like there is no gross difference, which should kill any objections to doing it correctly. Since I recently spend far to long researching this for an unrelated problem I'll add my 10c to the detail discussion. On 10/08/10 01:23, Chad La Joie wrote: Not necessarily, there are a number of not equal checks in there that should, in theory, perform better if you only use == only. In such a case, the use of != will just be a single check while !equals() will result in a char-by-char comparison. Actually, the next thing String.equals tests is length equality - so character comparison will only be reached if the strings are the same length. Since the char by char comparison returns on the first mismatch, then only same length strings with shared prefixes will show the expected slowness. (namespace URIs are likely to share prefixes, but I think are not particularly likely to be the same length, unless actually equal)... thus String.equals is only likely to be slow where comparing long distinct but equal strings (so intern or alternative string pooling techniques needed for == benefit .equals without all the nasty loopholes: even if .equals is occasionally slow, at least it is always right). In circumstances where doing repeated tests with many length and prefix matches, adding a hash code inequality test ((s1.hashCode()== s2.hashCode())s1.equals(s2)) could prevent practically all char-by-char checks for !equal cases (but if the same strings are never repeatedly used, the hash code calculation could be an issue; nb intern results in hash calculation for all strings anyway)... pooling is still needed to speed up matches for equality though. Re VM options I would feel -server is definitely the right test bed, both because of the more aggressive JIT, and also because the code is likely to see heaviest real world cases in -server VMs. -- Chad La Joie http://itumi.biz trusted identities, delivered
Re: Status of == vs equals() RESULTS
Okay, I'll prepare a patch for you by the end of the week. On 8/24/10 2:23 PM, Colm O hEigeartaigh wrote: Sounds fine to me. Colm. On Mon, Aug 23, 2010 at 8:55 PM, Chad La Joielaj...@itumi.biz wrote: Okay, getting back to this. I tried my tests again this time with: - a 7.5MB SAML metadata document (so lots of comparisons) - 100 warm up runs then 100 timed runs - an explicit GC between each run to keep it from happening during the runs since the DOMs were so large No real difference in results. equals() was faster. So, at this point, I can't see any reason to do anything other than equals(). It's the actual correct way of doing the comparison in that it will always return the proper result and the JVM definitely seems to be optimizing its use. On 8/10/10 7:53 AM, Chad La Joie wrote: Okay, I certainly have a number of SAML documents lying around so I'll try with those as well. And, of course, I'll report back the results I get. On 8/10/10 4:46 AM, Raul Benito wrote: As the original author of the changes of equals to == in intern namespaces, I can tell that original in 1.4 and 1.5 and with my data (that was the verification of a SAML/Liberty AuthnReq in a multi thread tests, and the old Juice JCE provider). The change was 10% to 20% faster. The SAML is one of the real example of signing and has some url with common prefixes and same length url. The Juice provider also helps to get rid of the signing/digest cost (a verification is two c14n one of the signing part and c14n of the signature), but i think just a c14n is a good way of measure it. Also take into account that the == vs equals debate is more a memory workload cache problem, if we have to iterate over and over every char just to see if it is not equals, we trash the cache (That's why i used the multi thread to simulate a server decoding requests with more or less the same code, but in different times and different workload) Nevertheless if you have test with a more modern jre and the code .equals is behaving better, just go ahead and kiss goodbye to the ==. Clive, using the .hashCode for strings in this case is not a big speed-up as it is going to go through all the chars of the string, trashing cache again, and multiplying and adding the result to an integer, instead of a fail in the first different char or just summarize to a boolean.\ Regards, On Tue, Aug 10, 2010 at 2:37 AM, Clive Brettingham-Moore xml...@brettingham-moore.netmailto:xml...@brettingham-moore.net wrote: Have to agree .equals is the way to go, since correctness of == is too reliant on what must be considered implementation optimisations in the parser. Benchmarking in JVM is notoriously difficult, but it does look like there is no gross difference, which should kill any objections to doing it correctly. Since I recently spend far to long researching this for an unrelated problem I'll add my 10c to the detail discussion. On 10/08/10 01:23, Chad La Joie wrote: Not necessarily, there are a number of not equal checks in there that should, in theory, perform better if you only use == only. In such a case, the use of != will just be a single check while !equals() will result in a char-by-char comparison. Actually, the next thing String.equals tests is length equality - so character comparison will only be reached if the strings are the same length. Since the char by char comparison returns on the first mismatch, then only same length strings with shared prefixes will show the expected slowness. (namespace URIs are likely to share prefixes, but I think are not particularly likely to be the same length, unless actually equal)... thus String.equals is only likely to be slow where comparing long distinct but equal strings (so intern or alternative string pooling techniques needed for == benefit .equals without all the nasty loopholes: even if .equals is occasionally slow, at least it is always right). In circumstances where doing repeated tests with many length and prefix matches, adding a hash code inequality test ((s1.hashCode()== s2.hashCode())s1.equals(s2)) could prevent practically all char-by-char checks for !equal cases (but if the same strings are never repeatedly used, the hash code calculation could be an issue; nb intern results in hash calculation for all strings anyway)... pooling is still needed to speed up matches for equality though. Re VM options I would feel -server is definitely the right test bed, both because of the more aggressive JIT, and also because the code is likely to see heaviest real world cases in -server VMs. -- Chad La Joie http://itumi.biz trusted identities, delivered -- Chad La Joie http://itumi.biz trusted identities, delivered
Re: Status of == vs equals() RESULTS
Okay, getting back to this. I tried my tests again this time with: - a 7.5MB SAML metadata document (so lots of comparisons) - 100 warm up runs then 100 timed runs - an explicit GC between each run to keep it from happening during the runs since the DOMs were so large No real difference in results. equals() was faster. So, at this point, I can't see any reason to do anything other than equals(). It's the actual correct way of doing the comparison in that it will always return the proper result and the JVM definitely seems to be optimizing its use. On 8/10/10 7:53 AM, Chad La Joie wrote: Okay, I certainly have a number of SAML documents lying around so I'll try with those as well. And, of course, I'll report back the results I get. On 8/10/10 4:46 AM, Raul Benito wrote: As the original author of the changes of equals to == in intern namespaces, I can tell that original in 1.4 and 1.5 and with my data (that was the verification of a SAML/Liberty AuthnReq in a multi thread tests, and the old Juice JCE provider). The change was 10% to 20% faster. The SAML is one of the real example of signing and has some url with common prefixes and same length url. The Juice provider also helps to get rid of the signing/digest cost (a verification is two c14n one of the signing part and c14n of the signature), but i think just a c14n is a good way of measure it. Also take into account that the == vs equals debate is more a memory workload cache problem, if we have to iterate over and over every char just to see if it is not equals, we trash the cache (That's why i used the multi thread to simulate a server decoding requests with more or less the same code, but in different times and different workload) Nevertheless if you have test with a more modern jre and the code .equals is behaving better, just go ahead and kiss goodbye to the ==. Clive, using the .hashCode for strings in this case is not a big speed-up as it is going to go through all the chars of the string, trashing cache again, and multiplying and adding the result to an integer, instead of a fail in the first different char or just summarize to a boolean.\ Regards, On Tue, Aug 10, 2010 at 2:37 AM, Clive Brettingham-Moore xml...@brettingham-moore.net mailto:xml...@brettingham-moore.net wrote: Have to agree .equals is the way to go, since correctness of == is too reliant on what must be considered implementation optimisations in the parser. Benchmarking in JVM is notoriously difficult, but it does look like there is no gross difference, which should kill any objections to doing it correctly. Since I recently spend far to long researching this for an unrelated problem I'll add my 10c to the detail discussion. On 10/08/10 01:23, Chad La Joie wrote: Not necessarily, there are a number of not equal checks in there that should, in theory, perform better if you only use == only. In such a case, the use of != will just be a single check while !equals() will result in a char-by-char comparison. Actually, the next thing String.equals tests is length equality - so character comparison will only be reached if the strings are the same length. Since the char by char comparison returns on the first mismatch, then only same length strings with shared prefixes will show the expected slowness. (namespace URIs are likely to share prefixes, but I think are not particularly likely to be the same length, unless actually equal)... thus String.equals is only likely to be slow where comparing long distinct but equal strings (so intern or alternative string pooling techniques needed for == benefit .equals without all the nasty loopholes: even if .equals is occasionally slow, at least it is always right). In circumstances where doing repeated tests with many length and prefix matches, adding a hash code inequality test ((s1.hashCode()== s2.hashCode())s1.equals(s2)) could prevent practically all char-by-char checks for !equal cases (but if the same strings are never repeatedly used, the hash code calculation could be an issue; nb intern results in hash calculation for all strings anyway)... pooling is still needed to speed up matches for equality though. Re VM options I would feel -server is definitely the right test bed, both because of the more aggressive JIT, and also because the code is likely to see heaviest real world cases in -server VMs. -- Chad La Joie http://itumi.biz trusted identities, delivered
Re: Status of == vs equals() RESULTS
I would prefer if we stuck to the original plan of making sure == comparisons are only done for namespaces in a single piece of pluggable code. However, I think we should now revert to making the .equals comparison as the default for the next release, given that there is no compelling reason to do otherwise. Anyone who wants to experiment with getting a performance increase, can just plug the other piece of code in. Thoughts? Colm. On Mon, Aug 9, 2010 at 11:07 PM, Chad La Joie laj...@itumi.biz wrote: I guess I didn't explicitly say this, but if, after a few days, people can't suggest an issue with this testing methodology or provide testing inputs that show different results, I'll rip out the helper class I added and just use equals() everywhere. That'll make the code a lot nicer to read. On 8/9/10 10:19 AM, Chad La Joie wrote: So, I have some unexpected results from this work. I implemented a helper class that checked the equality of element local names, attribute local names, namespace URIs, and namespace prefixes (i.e. everything that Xerces always interns). Then I made sure to replace all == != and equals() that I could find with the appropriate call. To test, I picked the Canonicalizer20010315ExclusiveTest test case and made two alterations to the test22*excl methods: - do one c14n operation out the timing loop just to make sure all the classes are in memory, constants are loaded, etc. - in a 100 iteration loop, create a new canonicalizer, canonicalize a DOM tree, and time it using nanosecond time I did this for the example2_2_1.xml[1], example2_2_2.xml[2], example 2_2_3.xml[3] input files (test221excl, test221excl, test223excl respectively). Here are the results, measured in nanosecond timing. total indicates the total time spent in all 100 runs, i.e. the summation of each of the 100 results. test221excl: equals() == min 101000 99000 max 123000 191000 median 103000 105000 avg 103760 106540 total 10376000 10654000 test222excl: equals() == min 99000 101000 max 192000 128000 median 10 108000 avg 102110 108480 total 10211000 10848000 test223excl (an XPath nodeset canonicalization) equals() == min 254000 248000 max 29 353000 median 266000 265000 avg 266820 265800 total 26682000 2658 So, what these numbers appear to suggest is that, in fact, equals() is more often faster than ==. This seems counter-intuitive unless the JVM has specialized optimization for the String.equals() method. Can anyone see where my testing is likely to be flawed? [1] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_1.xml?revision=350494view=markup [2] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_2.xml?revision=350494view=markup [3] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_3.xml?revision=350915view=markup On 8/2/10 10:11 AM, Chad La Joie wrote: So, while I don't have my access yet, Colm asked me if I'd take a look at the == vs equals() issue (relevant bugs: 40897[1], 45637[2], 46681[3]) My executive summary is that clearly, as things stand, the current code favors optimization over correctness. Rarely is this a good thing. Colm notes[4] that the reliance on intern'ed strings (and thus the ability to use ==) occurs sporadically throughout the code and not just within the ElementChecker implementations. He specifically mentioned that the various C14N implementations, and indeed the == is used about 6 times there for string comparison. My recommendation then is two fold: - Ensure that nothing other than namespace bits are compared via ==. I don't know that this occurs but the code should definitely be reviewed to ensure that. - Create a new NamespaceEqualityChecker that provides methods for checking the various bits of a namespace (URIs, prefixes) and use it anywhere that either == or equals() is used today. Implementations based on == and equals() would be provided with the default implementation being equals()-based. A configuration option should then be made available to control which impl gets used. Additionally, it might even be possible to add some smarts that could detect known good parsers that use interning and automatically use the == based implementation. I do not recommend changing any part of the code without addressing the whole codebase (i.e. all the =='s need to be fixed or no change should be made) because of the possibility of creating new, unwanted, effects. The current functionality is undesirable but better the devil you know. I think that this should be addressed in the upcoming 1.4.4 release. If quick consensus can be reached I'm willing to do the work with a window of time I have available over the next 2-3 weeks. [1] https://issues.apache.org/bugzilla/show_bug.cgi?id=40897 [2]
Re: Status of == vs equals() RESULTS
I encountered a problem before(version 1.4) caused by apache java code which uses == for namespace comparison. In my own code, when adding DOM node to a document, I have to create namespace using string from Apache classes. That is, I cannot directly use http://www.w3.org/2000/09/xmldsig#; as namespace String, instead I need to use APACHE.BlashClass.DSIG_URI. The bug is not only hard to find, but unnecessarily tie unrelated DOM code to XML security. Eric On Fri, Aug 13, 2010 at 6:02 AM, Colm O hEigeartaigh cohei...@apache.orgwrote: I would prefer if we stuck to the original plan of making sure == comparisons are only done for namespaces in a single piece of pluggable code. However, I think we should now revert to making the .equals comparison as the default for the next release, given that there is no compelling reason to do otherwise. Anyone who wants to experiment with getting a performance increase, can just plug the other piece of code in. Thoughts? Colm. On Mon, Aug 9, 2010 at 11:07 PM, Chad La Joie laj...@itumi.biz wrote: I guess I didn't explicitly say this, but if, after a few days, people can't suggest an issue with this testing methodology or provide testing inputs that show different results, I'll rip out the helper class I added and just use equals() everywhere. That'll make the code a lot nicer to read. On 8/9/10 10:19 AM, Chad La Joie wrote: So, I have some unexpected results from this work. I implemented a helper class that checked the equality of element local names, attribute local names, namespace URIs, and namespace prefixes (i.e. everything that Xerces always interns). Then I made sure to replace all == != and equals() that I could find with the appropriate call. To test, I picked the Canonicalizer20010315ExclusiveTest test case and made two alterations to the test22*excl methods: - do one c14n operation out the timing loop just to make sure all the classes are in memory, constants are loaded, etc. - in a 100 iteration loop, create a new canonicalizer, canonicalize a DOM tree, and time it using nanosecond time I did this for the example2_2_1.xml[1], example2_2_2.xml[2], example 2_2_3.xml[3] input files (test221excl, test221excl, test223excl respectively). Here are the results, measured in nanosecond timing. total indicates the total time spent in all 100 runs, i.e. the summation of each of the 100 results. test221excl: equals() == min 101000 99000 max 123000 191000 median 103000 105000 avg 103760 106540 total 10376000 10654000 test222excl: equals() == min 99000 101000 max 192000 128000 median 10 108000 avg 102110 108480 total 10211000 10848000 test223excl (an XPath nodeset canonicalization) equals() == min 254000 248000 max 29 353000 median 266000 265000 avg 266820 265800 total 26682000 2658 So, what these numbers appear to suggest is that, in fact, equals() is more often faster than ==. This seems counter-intuitive unless the JVM has specialized optimization for the String.equals() method. Can anyone see where my testing is likely to be flawed? [1] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_1.xml?revision=350494view=markup [2] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_2.xml?revision=350494view=markup [3] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_3.xml?revision=350915view=markup On 8/2/10 10:11 AM, Chad La Joie wrote: So, while I don't have my access yet, Colm asked me if I'd take a look at the == vs equals() issue (relevant bugs: 40897[1], 45637[2], 46681[3]) My executive summary is that clearly, as things stand, the current code favors optimization over correctness. Rarely is this a good thing. Colm notes[4] that the reliance on intern'ed strings (and thus the ability to use ==) occurs sporadically throughout the code and not just within the ElementChecker implementations. He specifically mentioned that the various C14N implementations, and indeed the == is used about 6 times there for string comparison. My recommendation then is two fold: - Ensure that nothing other than namespace bits are compared via ==. I don't know that this occurs but the code should definitely be reviewed to ensure that. - Create a new NamespaceEqualityChecker that provides methods for checking the various bits of a namespace (URIs, prefixes) and use it anywhere that either == or equals() is used today. Implementations based on == and equals() would be provided with the default implementation being equals()-based. A configuration option should then be made available to control which impl gets used. Additionally, it might even be possible to add some smarts that could detect known good parsers that use
Re: Status of == vs equals() RESULTS
Xerces C DOM parser wrapped as Java DOM. What I mean conventional equals() should be preferred though == might have small performance gain. Eric On Fri, Aug 13, 2010 at 10:50 AM, Chad La Joie laj...@itumi.biz wrote: Which parser/DOM impl were you using? On 8/13/10 1:33 PM, eric fu wrote: I encountered a problem before(version 1.4) caused by apache java code which uses == for namespace comparison. In my own code, when adding DOM node to a document, I have to create namespace using string from Apache classes. That is, I cannot directly use http://www.w3.org/2000/09/xmldsig#; as namespace String, instead I need to use APACHE.BlashClass.DSIG_URI. The bug is not only hard to find, but unnecessarily tie unrelated DOM code to XML security. Eric On Fri, Aug 13, 2010 at 6:02 AM, Colm O hEigeartaigh cohei...@apache.org mailto:cohei...@apache.org wrote: I would prefer if we stuck to the original plan of making sure == comparisons are only done for namespaces in a single piece of pluggable code. However, I think we should now revert to making the .equals comparison as the default for the next release, given that there is no compelling reason to do otherwise. Anyone who wants to experiment with getting a performance increase, can just plug the other piece of code in. Thoughts? Colm. On Mon, Aug 9, 2010 at 11:07 PM, Chad La Joie laj...@itumi.biz mailto:laj...@itumi.biz wrote: I guess I didn't explicitly say this, but if, after a few days, people can't suggest an issue with this testing methodology or provide testing inputs that show different results, I'll rip out the helper class I added and just use equals() everywhere. That'll make the code a lot nicer to read. On 8/9/10 10:19 AM, Chad La Joie wrote: So, I have some unexpected results from this work. I implemented a helper class that checked the equality of element local names, attribute local names, namespace URIs, and namespace prefixes (i.e. everything that Xerces always interns). Then I made sure to replace all == != and equals() that I could find with the appropriate call. To test, I picked the Canonicalizer20010315ExclusiveTest test case and made two alterations to the test22*excl methods: - do one c14n operation out the timing loop just to make sure all the classes are in memory, constants are loaded, etc. - in a 100 iteration loop, create a new canonicalizer, canonicalize a DOM tree, and time it using nanosecond time I did this for the example2_2_1.xml[1], example2_2_2.xml[2], example 2_2_3.xml[3] input files (test221excl, test221excl, test223excl respectively). Here are the results, measured in nanosecond timing. total indicates the total time spent in all 100 runs, i.e. the summation of each of the 100 results. test221excl: equals() == min 101000 99000 max 123000 191000 median 103000 105000 avg 103760 106540 total 10376000 10654000 test222excl: equals() == min 99000 101000 max 192000 128000 median 10 108000 avg 102110 108480 total 10211000 10848000 test223excl (an XPath nodeset canonicalization) equals() == min 254000 248000 max 29 353000 median 266000 265000 avg 266820 265800 total 26682000 2658 So, what these numbers appear to suggest is that, in fact, equals() is more often faster than ==. This seems counter-intuitive unless the JVM has specialized optimization for the String.equals() method. Can anyone see where my testing is likely to be flawed? [1] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_1.xml?revision=350494view=markup http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_1.xml?revision=350494view=markup [2] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_2.xml?revision=350494view=markup http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_2.xml?revision=350494view=markup [3] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_3.xml?revision=350915view=markup http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_3.xml?revision=350915view=markup On 8/2/10 10:11 AM, Chad La Joie wrote: So, while I don't have my access yet, Colm asked me if I'd take a look at the == vs equals() issue (relevant bugs: 40897[1], 45637[2], 46681[3])
Re: Status of == vs equals() RESULTS
Not to dispute your point but more to clarify mine. Mostly I wanted make the minor note about the length test preventing most char-by-char comparison (assuming intern or other canonicalization taking care of equality, as in the rest of the discussion). Hash code was an afterthought, which came to mind since I had recently been researching string canonicalization alternatives to intern (eg via a HashSet). I was only suggesting hashCode if *repeated* char-by-char comparison in unequal strings is causing performance problems (the case of same length strings with shared prefix was the most obvious; by the sound of it SAML may actually make this relevant in this case). The part I apparently didn't emphasize enough is that yes, it only offers advantage if strings are used repeatedly in problem comparisons (or hashCode has already been used): .hashCode is calculated lazily so will only be calculated once per string, (except for the unlikely case where hash code matches the sentinel value 0) - so for repeated use over a restricted set of strings the overhead can be amortized (intern internally calculates hash code for strings but AFAIK this is not currently used to preset this cached code, so there is will be hash calculation and associated cache churn one-off; non intern canonicalization using a hash table it will have cached the result so get it for free). Raul Benito wrote: As the original author of the changes of equals to == in intern namespaces, I can tell that original in 1.4 and 1.5 and with my data (that was the verification of a SAML/Liberty AuthnReq in a multi thread tests, and the old Juice JCE provider). The change was 10% to 20% faster. The SAML is one of the real example of signing and has some url with common prefixes and same length url. The Juice provider also helps to get rid of the signing/digest cost (a verification is two c14n one of the signing part and c14n of the signature), but i think just a c14n is a good way of measure it. Also take into account that the == vs equals debate is more a memory workload cache problem, if we have to iterate over and over every char just to see if it is not equals, we trash the cache (That's why i used the multi thread to simulate a server decoding requests with more or less the same code, but in different times and different workload) Nevertheless if you have test with a more modern jre and the code .equals is behaving better, just go ahead and kiss goodbye to the ==. Clive, using the .hashCode for strings in this case is not a big speed-up as it is going to go through all the chars of the string, trashing cache again, and multiplying and adding the result to an integer, instead of a fail in the first different char or just summarize to a boolean.\ Regards, On Tue, Aug 10, 2010 at 2:37 AM, Clive Brettingham-Moore xml...@brettingham-moore.net wrote: Have to agree .equals is the way to go, since correctness of == is too reliant on what must be considered implementation optimisations in the parser. Benchmarking in JVM is notoriously difficult, but it does look like there is no gross difference, which should kill any objections to doing it correctly. Since I recently spend far to long researching this for an unrelated problem I'll add my 10c to the detail discussion. On 10/08/10 01:23, Chad La Joie wrote: Not necessarily, there are a number of not equal checks in there that should, in theory, perform better if you only use == only. In such a case, the use of != will just be a single check while !equals() will result in a char-by-char comparison. Actually, the next thing String.equals tests is length equality - so character comparison will only be reached if the strings are the same length. Since the char by char comparison returns on the first mismatch, then only same length strings with shared prefixes will show the expected slowness. (namespace URIs are likely to share prefixes, but I think are not particularly likely to be the same length, unless actually equal)... thus String.equals is only likely to be slow where comparing long distinct but equal strings (so intern or alternative string pooling techniques needed for == benefit .equals without all the nasty loopholes: even if .equals is occasionally slow, at least it is always right). In circumstances where doing repeated tests with many length and prefix matches, adding a hash code inequality test ((s1.hashCode()== s2.hashCode())s1.equals(s2)) could prevent practically all char-by-char checks for !equal cases (but if the same strings are never repeatedly used, the hash code calculation could be an issue; nb intern results in hash calculation for all strings anyway)... pooling is still needed to speed up matches for equality though. Re VM options I would feel -server is definitely the right test bed, both because of the more aggressive JIT, and also because the code is likely to see heaviest real world cases in -server VMs.
Re: Status of == vs equals() RESULTS
So, I have some unexpected results from this work. I implemented a helper class that checked the equality of element local names, attribute local names, namespace URIs, and namespace prefixes (i.e. everything that Xerces always interns). Then I made sure to replace all == != and equals() that I could find with the appropriate call. To test, I picked the Canonicalizer20010315ExclusiveTest test case and made two alterations to the test22*excl methods: - do one c14n operation out the timing loop just to make sure all the classes are in memory, constants are loaded, etc. - in a 100 iteration loop, create a new canonicalizer, canonicalize a DOM tree, and time it using nanosecond time I did this for the example2_2_1.xml[1], example2_2_2.xml[2], example 2_2_3.xml[3] input files (test221excl, test221excl, test223excl respectively). Here are the results, measured in nanosecond timing. total indicates the total time spent in all 100 runs, i.e. the summation of each of the 100 results. test221excl: equals()== min 101000 99000 max 123000 191000 median 103000 105000 avg 103760 106540 total 10376000 10654000 test222excl: equals()== min 99000 101000 max 192000 128000 median 10 108000 avg 102110 108480 total 10211000 10848000 test223excl (an XPath nodeset canonicalization) equals()== min 254000 248000 max 29 353000 median 266000 265000 avg 266820 265800 total 26682000 2658 So, what these numbers appear to suggest is that, in fact, equals() is more often faster than ==. This seems counter-intuitive unless the JVM has specialized optimization for the String.equals() method. Can anyone see where my testing is likely to be flawed? [1] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_1.xml?revision=350494view=markup [2] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_2.xml?revision=350494view=markup [3] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_3.xml?revision=350915view=markup On 8/2/10 10:11 AM, Chad La Joie wrote: So, while I don't have my access yet, Colm asked me if I'd take a look at the == vs equals() issue (relevant bugs: 40897[1], 45637[2], 46681[3]) My executive summary is that clearly, as things stand, the current code favors optimization over correctness. Rarely is this a good thing. Colm notes[4] that the reliance on intern'ed strings (and thus the ability to use ==) occurs sporadically throughout the code and not just within the ElementChecker implementations. He specifically mentioned that the various C14N implementations, and indeed the == is used about 6 times there for string comparison. My recommendation then is two fold: - Ensure that nothing other than namespace bits are compared via ==. I don't know that this occurs but the code should definitely be reviewed to ensure that. - Create a new NamespaceEqualityChecker that provides methods for checking the various bits of a namespace (URIs, prefixes) and use it anywhere that either == or equals() is used today. Implementations based on == and equals() would be provided with the default implementation being equals()-based. A configuration option should then be made available to control which impl gets used. Additionally, it might even be possible to add some smarts that could detect known good parsers that use interning and automatically use the == based implementation. I do not recommend changing any part of the code without addressing the whole codebase (i.e. all the =='s need to be fixed or no change should be made) because of the possibility of creating new, unwanted, effects. The current functionality is undesirable but better the devil you know. I think that this should be addressed in the upcoming 1.4.4 release. If quick consensus can be reached I'm willing to do the work with a window of time I have available over the next 2-3 weeks. [1] https://issues.apache.org/bugzilla/show_bug.cgi?id=40897 [2] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637 [3] https://issues.apache.org/bugzilla/show_bug.cgi?id=46681 [4] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637#c1 -- Chad La Joie http://itumi.biz trusted identities, delivered
Re: Status of == vs equals() RESULTS
Hello Chad, What command line options did you use? My testings were more reliable if use 100 warms-up let the jit run its magic, and then go for the timed test. Also are you running both tests in the same invocation if you do, the second will be handicap, as the first one will be just inline the second will have a switch to see if it is one interface or the other. Regards, Raul On Mon, Aug 9, 2010 at 4:19 PM, Chad La Joie laj...@itumi.biz wrote: So, I have some unexpected results from this work. I implemented a helper class that checked the equality of element local names, attribute local names, namespace URIs, and namespace prefixes (i.e. everything that Xerces always interns). Then I made sure to replace all == != and equals() that I could find with the appropriate call. To test, I picked the Canonicalizer20010315ExclusiveTest test case and made two alterations to the test22*excl methods: - do one c14n operation out the timing loop just to make sure all the classes are in memory, constants are loaded, etc. - in a 100 iteration loop, create a new canonicalizer, canonicalize a DOM tree, and time it using nanosecond time I did this for the example2_2_1.xml[1], example2_2_2.xml[2], example 2_2_3.xml[3] input files (test221excl, test221excl, test223excl respectively). Here are the results, measured in nanosecond timing. total indicates the total time spent in all 100 runs, i.e. the summation of each of the 100 results. test221excl: equals()== min 101000 99000 max 123000 191000 median 103000 105000 avg 103760 106540 total 10376000 10654000 test222excl: equals()== min 99000 101000 max 192000 128000 median 10 108000 avg 102110 108480 total 10211000 10848000 test223excl (an XPath nodeset canonicalization) equals()== min 254000 248000 max 29 353000 median 266000 265000 avg 266820 265800 total 26682000 2658 So, what these numbers appear to suggest is that, in fact, equals() is more often faster than ==. This seems counter-intuitive unless the JVM has specialized optimization for the String.equals() method. Can anyone see where my testing is likely to be flawed? [1] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_1.xml?revision=350494view=markup [2] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_2.xml?revision=350494view=markup [3] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_3.xml?revision=350915view=markup On 8/2/10 10:11 AM, Chad La Joie wrote: So, while I don't have my access yet, Colm asked me if I'd take a look at the == vs equals() issue (relevant bugs: 40897[1], 45637[2], 46681[3]) My executive summary is that clearly, as things stand, the current code favors optimization over correctness. Rarely is this a good thing. Colm notes[4] that the reliance on intern'ed strings (and thus the ability to use ==) occurs sporadically throughout the code and not just within the ElementChecker implementations. He specifically mentioned that the various C14N implementations, and indeed the == is used about 6 times there for string comparison. My recommendation then is two fold: - Ensure that nothing other than namespace bits are compared via ==. I don't know that this occurs but the code should definitely be reviewed to ensure that. - Create a new NamespaceEqualityChecker that provides methods for checking the various bits of a namespace (URIs, prefixes) and use it anywhere that either == or equals() is used today. Implementations based on == and equals() would be provided with the default implementation being equals()-based. A configuration option should then be made available to control which impl gets used. Additionally, it might even be possible to add some smarts that could detect known good parsers that use interning and automatically use the == based implementation. I do not recommend changing any part of the code without addressing the whole codebase (i.e. all the =='s need to be fixed or no change should be made) because of the possibility of creating new, unwanted, effects. The current functionality is undesirable but better the devil you know. I think that this should be addressed in the upcoming 1.4.4 release. If quick consensus can be reached I'm willing to do the work with a window of time I have available over the next 2-3 weeks. [1] https://issues.apache.org/bugzilla/show_bug.cgi?id=40897 [2] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637 [3] https://issues.apache.org/bugzilla/show_bug.cgi?id=46681 [4] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637#c1 -- Chad La Joie http://itumi.biz trusted identities, delivered
Re: Status of == vs equals() RESULTS
On 8/9/10 10:40 AM, Raul Benito wrote: What command line options did you use? No options. My testings were more reliable if use 100 warms-up let the jit run its magic, and then go for the timed test. Okay, I try that. Also are you running both tests in the same invocation if you do, the second will be handicap, as the first one will be just inline the second will have a switch to see if it is one interface or the other. No, each run was in a clean JVM. -- Chad La Joie http://itumi.biz trusted identities, delivered
Re: Status of == vs equals() RESULTS
On 8/9/10 10:45 AM, Chad La Joie wrote: My testings were more reliable if use 100 warms-up let the jit run its magic, and then go for the timed test. Okay, I try that. It made no difference. -- Chad La Joie http://itumi.biz trusted identities, delivered
Re: Status of == vs equals() RESULTS
On Mon, Aug 9, 2010 at 4:45 PM, Chad La Joie laj...@itumi.biz wrote: On 8/9/10 10:40 AM, Raul Benito wrote: What command line options did you use? No options. I did mine with --server and sometimes with more memory but it is really strange, what version of the JRE are you using? Regards, My testings were more reliable if use 100 warms-up let the jit run its magic, and then go for the timed test. Okay, I try that. Also are you running both tests in the same invocation if you do, the second will be handicap, as the first one will be just inline the second will have a switch to see if it is one interface or the other. No, each run was in a clean JVM. -- Chad La Joie http://itumi.biz trusted identities, delivered
RE: Status of == vs equals() RESULTS
In JDK 1.5, String.equals() begins with: public boolean equals(Object anObject) { if (this == anObject) { return true; } ... Since String is a final class, the JIT compiler is free to in-line String.equals() This is such a common case, I bet the JIT compiler team made it a special case to in-line at least the beginning of String.equals() at every invocation site. If your test bed only uses intern Strings this will return early with the same behavior as == for equal strings. Is it possible your test bed calls String.equals() with an overwhelming percentage of equal strings? -Original Message- From: Chad La Joie [mailto:laj...@itumi.biz] Sent: Monday, August 09, 2010 10:20 AM To: security-dev@xml.apache.org Subject: Re: Status of == vs equals() RESULTS So, I have some unexpected results from this work. I implemented a helper class that checked the equality of element local names, attribute local names, namespace URIs, and namespace prefixes (i.e. everything that Xerces always interns). Then I made sure to replace all == != and equals() that I could find with the appropriate call. To test, I picked the Canonicalizer20010315ExclusiveTest test case and made two alterations to the test22*excl methods: - do one c14n operation out the timing loop just to make sure all the classes are in memory, constants are loaded, etc. - in a 100 iteration loop, create a new canonicalizer, canonicalize a DOM tree, and time it using nanosecond time I did this for the example2_2_1.xml[1], example2_2_2.xml[2], example 2_2_3.xml[3] input files (test221excl, test221excl, test223excl respectively). Here are the results, measured in nanosecond timing. total indicates the total time spent in all 100 runs, i.e. the summation of each of the 100 results. test221excl: equals()== min 101000 99000 max 123000 191000 median 103000 105000 avg 103760 106540 total 10376000 10654000 test222excl: equals()== min 99000 101000 max 192000 128000 median 10 108000 avg 102110 108480 total 10211000 10848000 test223excl (an XPath nodeset canonicalization) equals()== min 254000 248000 max 29 353000 median 266000 265000 avg 266820 265800 total 26682000 2658 So, what these numbers appear to suggest is that, in fact, equals() is more often faster than ==. This seems counter-intuitive unless the JVM has specialized optimization for the String.equals() method. Can anyone see where my testing is likely to be flawed? [1] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_1.xml?revision=350494view=markup [2] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_2.xml?revision=350494view=markup [3] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_3.xml?revision=350915view=markup On 8/2/10 10:11 AM, Chad La Joie wrote: So, while I don't have my access yet, Colm asked me if I'd take a look at the == vs equals() issue (relevant bugs: 40897[1], 45637[2], 46681[3]) My executive summary is that clearly, as things stand, the current code favors optimization over correctness. Rarely is this a good thing. Colm notes[4] that the reliance on intern'ed strings (and thus the ability to use ==) occurs sporadically throughout the code and not just within the ElementChecker implementations. He specifically mentioned that the various C14N implementations, and indeed the == is used about 6 times there for string comparison. My recommendation then is two fold: - Ensure that nothing other than namespace bits are compared via ==. I don't know that this occurs but the code should definitely be reviewed to ensure that. - Create a new NamespaceEqualityChecker that provides methods for checking the various bits of a namespace (URIs, prefixes) and use it anywhere that either == or equals() is used today. Implementations based on == and equals() would be provided with the default implementation being equals()-based. A configuration option should then be made available to control which impl gets used. Additionally, it might even be possible to add some smarts that could detect known good parsers that use interning and automatically use the == based implementation. I do not recommend changing any part of the code without addressing the whole codebase (i.e. all the =='s need to be fixed or no change should be made) because of the possibility of creating new, unwanted, effects. The current functionality is undesirable but better the devil you know. I think that this should be addressed in the upcoming 1.4.4 release. If quick consensus can be reached I'm willing to do the work with a window of time I have available over the next 2-3 weeks. [1] https://issues.apache.org/bugzilla
Re: Status of == vs equals() RESULTS
On 8/9/10 11:10 AM, Raul Benito wrote: I did mine with --server and sometimes with more memory but it is really strange, what version of the JRE are you using? What optimizations in particular did you want to take advantage of using --server? Did you see anything to suggest that it was running out of memory? Those test files should produce anything that would use up the default amount of memory. I'm using Apple's repackage of Sun JDK 1.6.0_20, 64 bit -- Chad La Joie http://itumi.biz trusted identities, delivered
Re: Status of == vs equals() RESULTS
I guess I didn't explicitly say this, but if, after a few days, people can't suggest an issue with this testing methodology or provide testing inputs that show different results, I'll rip out the helper class I added and just use equals() everywhere. That'll make the code a lot nicer to read. On 8/9/10 10:19 AM, Chad La Joie wrote: So, I have some unexpected results from this work. I implemented a helper class that checked the equality of element local names, attribute local names, namespace URIs, and namespace prefixes (i.e. everything that Xerces always interns). Then I made sure to replace all == != and equals() that I could find with the appropriate call. To test, I picked the Canonicalizer20010315ExclusiveTest test case and made two alterations to the test22*excl methods: - do one c14n operation out the timing loop just to make sure all the classes are in memory, constants are loaded, etc. - in a 100 iteration loop, create a new canonicalizer, canonicalize a DOM tree, and time it using nanosecond time I did this for the example2_2_1.xml[1], example2_2_2.xml[2], example 2_2_3.xml[3] input files (test221excl, test221excl, test223excl respectively). Here are the results, measured in nanosecond timing. total indicates the total time spent in all 100 runs, i.e. the summation of each of the 100 results. test221excl: equals() == min 101000 99000 max 123000 191000 median 103000 105000 avg 103760 106540 total 10376000 10654000 test222excl: equals() == min 99000 101000 max 192000 128000 median 10 108000 avg 102110 108480 total 10211000 10848000 test223excl (an XPath nodeset canonicalization) equals() == min 254000 248000 max 29 353000 median 266000 265000 avg 266820 265800 total 26682000 2658 So, what these numbers appear to suggest is that, in fact, equals() is more often faster than ==. This seems counter-intuitive unless the JVM has specialized optimization for the String.equals() method. Can anyone see where my testing is likely to be flawed? [1] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_1.xml?revision=350494view=markup [2] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_2.xml?revision=350494view=markup [3] http://svn.apache.org/viewvc/xml/security/trunk/data/org/apache/xml/security/c14n/inExcl/example2_2_3.xml?revision=350915view=markup On 8/2/10 10:11 AM, Chad La Joie wrote: So, while I don't have my access yet, Colm asked me if I'd take a look at the == vs equals() issue (relevant bugs: 40897[1], 45637[2], 46681[3]) My executive summary is that clearly, as things stand, the current code favors optimization over correctness. Rarely is this a good thing. Colm notes[4] that the reliance on intern'ed strings (and thus the ability to use ==) occurs sporadically throughout the code and not just within the ElementChecker implementations. He specifically mentioned that the various C14N implementations, and indeed the == is used about 6 times there for string comparison. My recommendation then is two fold: - Ensure that nothing other than namespace bits are compared via ==. I don't know that this occurs but the code should definitely be reviewed to ensure that. - Create a new NamespaceEqualityChecker that provides methods for checking the various bits of a namespace (URIs, prefixes) and use it anywhere that either == or equals() is used today. Implementations based on == and equals() would be provided with the default implementation being equals()-based. A configuration option should then be made available to control which impl gets used. Additionally, it might even be possible to add some smarts that could detect known good parsers that use interning and automatically use the == based implementation. I do not recommend changing any part of the code without addressing the whole codebase (i.e. all the =='s need to be fixed or no change should be made) because of the possibility of creating new, unwanted, effects. The current functionality is undesirable but better the devil you know. I think that this should be addressed in the upcoming 1.4.4 release. If quick consensus can be reached I'm willing to do the work with a window of time I have available over the next 2-3 weeks. [1] https://issues.apache.org/bugzilla/show_bug.cgi?id=40897 [2] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637 [3] https://issues.apache.org/bugzilla/show_bug.cgi?id=46681 [4] https://issues.apache.org/bugzilla/show_bug.cgi?id=45637#c1 -- Chad La Joie http://itumi.biz trusted identities, delivered
Re: Status of == vs equals() RESULTS
Have to agree .equals is the way to go, since correctness of == is too reliant on what must be considered implementation optimisations in the parser. Benchmarking in JVM is notoriously difficult, but it does look like there is no gross difference, which should kill any objections to doing it correctly. Since I recently spend far to long researching this for an unrelated problem I'll add my 10c to the detail discussion. On 10/08/10 01:23, Chad La Joie wrote: Not necessarily, there are a number of not equal checks in there that should, in theory, perform better if you only use == only. In such a case, the use of != will just be a single check while !equals() will result in a char-by-char comparison. Actually, the next thing String.equals tests is length equality - so character comparison will only be reached if the strings are the same length. Since the char by char comparison returns on the first mismatch, then only same length strings with shared prefixes will show the expected slowness. (namespace URIs are likely to share prefixes, but I think are not particularly likely to be the same length, unless actually equal)... thus String.equals is only likely to be slow where comparing long distinct but equal strings (so intern or alternative string pooling techniques needed for == benefit .equals without all the nasty loopholes: even if .equals is occasionally slow, at least it is always right). In circumstances where doing repeated tests with many length and prefix matches, adding a hash code inequality test ((s1.hashCode()== s2.hashCode())s1.equals(s2)) could prevent practically all char-by-char checks for !equal cases (but if the same strings are never repeatedly used, the hash code calculation could be an issue; nb intern results in hash calculation for all strings anyway)... pooling is still needed to speed up matches for equality though. Re VM options I would feel -server is definitely the right test bed, both because of the more aggressive JIT, and also because the code is likely to see heaviest real world cases in -server VMs.