Re: committing performance patches without performance numbers
The elephant in the room to me is what Ariel said: > > What about all the commits that don't intend to have a performance impact > but do? On top of that, it's all well and good to microbench the change you make in a vacuum but what happens when the resource usage implications of that change have side-effects on the rest of the system? I've always been a vocal proponent of proving your improvement w/numbers (be they microbenchmark or stress-based single node, etc), so I'm strongly +1 on us improving our hygiene on this front. On Thu, Mar 9, 2017 at 4:56 PM, Ariel Weisbergwrote: > Hi, > > I should clarify. Should in the sense of it was fine for the process we > had at the time (ugh!) but it's not what we should do in the future. > > Ariel > > On Thu, Mar 9, 2017, at 04:55 PM, Ariel Weisberg wrote: > > Hi, > > > > I agree that patch could have used it and it was amenable to > > micro-benchmarking. Just to be pedantic about process which is something > > I approve of to the chagrin of so many. > > > > On a completely related note that change also randomly boxes a boolean. > > > > Ariel > > > > On Thu, Mar 9, 2017, at 03:45 PM, Jonathan Haddad wrote: > > > I don't expect everyone to run a 500 node cluster off to the side to > test > > > their patches, but at least some indication that the contributor > started > > > Cassandra on their laptop would be a good sign. The JIRA I referenced > > > was > > > an optimization around List, Set and Map serialization. Would it > really > > > have been that crazy to run a handful of benchmarks locally and post > > > those > > > results? > > > > > > On Thu, Mar 9, 2017 at 12:26 PM Ariel Weisberg > wrote: > > > > > > > Hi, > > > > > > > > I think there are issues around the availability of hardware > sufficient > > > > to demonstrate the performance concerns under test. It's an open > source > > > > project without centralized infrastructure. A lot of performance > > > > contributions come from people running C* in production. They are > > > > already running these changes and have seen the improvement, but > > > > communicating that to the outside world in a convincing way can be > hard. > > > > > > > > Right now we don't even have performance in continuous integration. I > > > > think we are putting the cart before the horse in that respect. What > > > > about all the commits that don't intend to have a performance impact > but > > > > do? Even if we had performance metrics in CI who is going to triage > the > > > > results religiously? > > > > > > > > We also to my knowledge don't have benchmarks for key functionality > in > > > > cassandra-stress. Can cassandra-stress benchmark CAS? My > recollection is > > > > every time I looked is that it wasn't there. > > > > > > > > We can only set the bar as high as contributors are able to meet. > > > > Certainly if they can't justify why they can't benchmark the thing > they > > > > want to contribute then reviewers should make them go and benchmark > it. > > > > > > > > Regards, > > > > Ariel > > > > > > > > On Thu, Mar 9, 2017, at 03:11 PM, Jeff Jirsa wrote: > > > > > Agree. Anything that's meant to increase performance should > demonstrate > > > > > it > > > > > actually does that. We have microbench available in recent > versions - > > > > > writing a new microbenchmark isn't all that onerous. Would be > great if we > > > > > had perf tests included in the normal testall/dtest workflow for > ALL > > > > > patches so we could quickly spot regressions, but that gets pretty > > > > > expensive in terms of running long enough tests to actually see > most > > > > > common > > > > > code paths. > > > > > > > > > > > > > > > On Thu, Mar 9, 2017 at 12:00 PM, Jonathan Haddad < > j...@jonhaddad.com> > > > > > wrote: > > > > > > > > > > > I'd like to discuss what I consider to be a pretty important > matter - > > > > > > patches which are written for the sole purpose of improving > performance > > > > > > without including a single performance benchmark in the JIRA. > > > > > > > > > > > > My original email was in "Testing and Jira Tickets", i'll copy > it here > > > > > > for posterity: > > > > > > > > > > > > If you don't mind, I'd like to broaden the discussion a little > bit to > > > > also > > > > > > discuss performance related patches. For instance, > CASSANDRA-13271 > > > > was a > > > > > > performance / optimization related patch that included *zero* > > > > information > > > > > > on if there was any perf improvement or a regression as a result > of the > > > > > > change, even though I've asked twice for that information. > > > > > > > > > > > > In addition to "does this thing break anything" we should be > asking > > > > "how > > > > > > does this patch affect performance?" (and were the appropriate > docs > > > > > > included, but that's another topic altogether) > > > > > > > > > > > > There's a minor note about perf related stuff here: > > > > > >
Re: committing performance patches without performance numbers
Hi, I should clarify. Should in the sense of it was fine for the process we had at the time (ugh!) but it's not what we should do in the future. Ariel On Thu, Mar 9, 2017, at 04:55 PM, Ariel Weisberg wrote: > Hi, > > I agree that patch could have used it and it was amenable to > micro-benchmarking. Just to be pedantic about process which is something > I approve of to the chagrin of so many. > > On a completely related note that change also randomly boxes a boolean. > > Ariel > > On Thu, Mar 9, 2017, at 03:45 PM, Jonathan Haddad wrote: > > I don't expect everyone to run a 500 node cluster off to the side to test > > their patches, but at least some indication that the contributor started > > Cassandra on their laptop would be a good sign. The JIRA I referenced > > was > > an optimization around List, Set and Map serialization. Would it really > > have been that crazy to run a handful of benchmarks locally and post > > those > > results? > > > > On Thu, Mar 9, 2017 at 12:26 PM Ariel Weisbergwrote: > > > > > Hi, > > > > > > I think there are issues around the availability of hardware sufficient > > > to demonstrate the performance concerns under test. It's an open source > > > project without centralized infrastructure. A lot of performance > > > contributions come from people running C* in production. They are > > > already running these changes and have seen the improvement, but > > > communicating that to the outside world in a convincing way can be hard. > > > > > > Right now we don't even have performance in continuous integration. I > > > think we are putting the cart before the horse in that respect. What > > > about all the commits that don't intend to have a performance impact but > > > do? Even if we had performance metrics in CI who is going to triage the > > > results religiously? > > > > > > We also to my knowledge don't have benchmarks for key functionality in > > > cassandra-stress. Can cassandra-stress benchmark CAS? My recollection is > > > every time I looked is that it wasn't there. > > > > > > We can only set the bar as high as contributors are able to meet. > > > Certainly if they can't justify why they can't benchmark the thing they > > > want to contribute then reviewers should make them go and benchmark it. > > > > > > Regards, > > > Ariel > > > > > > On Thu, Mar 9, 2017, at 03:11 PM, Jeff Jirsa wrote: > > > > Agree. Anything that's meant to increase performance should demonstrate > > > > it > > > > actually does that. We have microbench available in recent versions - > > > > writing a new microbenchmark isn't all that onerous. Would be great if > > > > we > > > > had perf tests included in the normal testall/dtest workflow for ALL > > > > patches so we could quickly spot regressions, but that gets pretty > > > > expensive in terms of running long enough tests to actually see most > > > > common > > > > code paths. > > > > > > > > > > > > On Thu, Mar 9, 2017 at 12:00 PM, Jonathan Haddad > > > > wrote: > > > > > > > > > I'd like to discuss what I consider to be a pretty important matter - > > > > > patches which are written for the sole purpose of improving > > > > > performance > > > > > without including a single performance benchmark in the JIRA. > > > > > > > > > > My original email was in "Testing and Jira Tickets", i'll copy it here > > > > > for posterity: > > > > > > > > > > If you don't mind, I'd like to broaden the discussion a little bit to > > > also > > > > > discuss performance related patches. For instance, CASSANDRA-13271 > > > was a > > > > > performance / optimization related patch that included *zero* > > > information > > > > > on if there was any perf improvement or a regression as a result of > > > > > the > > > > > change, even though I've asked twice for that information. > > > > > > > > > > In addition to "does this thing break anything" we should be asking > > > "how > > > > > does this patch affect performance?" (and were the appropriate docs > > > > > included, but that's another topic altogether) > > > > > > > > > > There's a minor note about perf related stuff here: > > > > > http://cassandra.apache.org/doc/latest/development/ > > > > > testing.html#performance-testing > > > > > > > > > > > > > > > "Performance tests for Cassandra are a special breed of tests that are > > > not > > > > > part of the usual patch contribution process. In fact you can > > > contribute > > > > > tons of patches to Cassandra without ever running performance tests. > > > They > > > > > are important however when working on performance improvements, as > > > > > such > > > > > improvements must be measurable." > > > > > > > > > > I think performance numbers aren't just important, but should be a > > > > > *requirement* to merge a performance patch. > > > > > > > > > > Thoughts? > > > > > Jon > > > > > > > >
Re: committing performance patches without performance numbers
Hi, I agree that patch could have used it and it was amenable to micro-benchmarking. Just to be pedantic about process which is something I approve of to the chagrin of so many. On a completely related note that change also randomly boxes a boolean. Ariel On Thu, Mar 9, 2017, at 03:45 PM, Jonathan Haddad wrote: > I don't expect everyone to run a 500 node cluster off to the side to test > their patches, but at least some indication that the contributor started > Cassandra on their laptop would be a good sign. The JIRA I referenced > was > an optimization around List, Set and Map serialization. Would it really > have been that crazy to run a handful of benchmarks locally and post > those > results? > > On Thu, Mar 9, 2017 at 12:26 PM Ariel Weisbergwrote: > > > Hi, > > > > I think there are issues around the availability of hardware sufficient > > to demonstrate the performance concerns under test. It's an open source > > project without centralized infrastructure. A lot of performance > > contributions come from people running C* in production. They are > > already running these changes and have seen the improvement, but > > communicating that to the outside world in a convincing way can be hard. > > > > Right now we don't even have performance in continuous integration. I > > think we are putting the cart before the horse in that respect. What > > about all the commits that don't intend to have a performance impact but > > do? Even if we had performance metrics in CI who is going to triage the > > results religiously? > > > > We also to my knowledge don't have benchmarks for key functionality in > > cassandra-stress. Can cassandra-stress benchmark CAS? My recollection is > > every time I looked is that it wasn't there. > > > > We can only set the bar as high as contributors are able to meet. > > Certainly if they can't justify why they can't benchmark the thing they > > want to contribute then reviewers should make them go and benchmark it. > > > > Regards, > > Ariel > > > > On Thu, Mar 9, 2017, at 03:11 PM, Jeff Jirsa wrote: > > > Agree. Anything that's meant to increase performance should demonstrate > > > it > > > actually does that. We have microbench available in recent versions - > > > writing a new microbenchmark isn't all that onerous. Would be great if we > > > had perf tests included in the normal testall/dtest workflow for ALL > > > patches so we could quickly spot regressions, but that gets pretty > > > expensive in terms of running long enough tests to actually see most > > > common > > > code paths. > > > > > > > > > On Thu, Mar 9, 2017 at 12:00 PM, Jonathan Haddad > > > wrote: > > > > > > > I'd like to discuss what I consider to be a pretty important matter - > > > > patches which are written for the sole purpose of improving performance > > > > without including a single performance benchmark in the JIRA. > > > > > > > > My original email was in "Testing and Jira Tickets", i'll copy it here > > > > for posterity: > > > > > > > > If you don't mind, I'd like to broaden the discussion a little bit to > > also > > > > discuss performance related patches. For instance, CASSANDRA-13271 > > was a > > > > performance / optimization related patch that included *zero* > > information > > > > on if there was any perf improvement or a regression as a result of the > > > > change, even though I've asked twice for that information. > > > > > > > > In addition to "does this thing break anything" we should be asking > > "how > > > > does this patch affect performance?" (and were the appropriate docs > > > > included, but that's another topic altogether) > > > > > > > > There's a minor note about perf related stuff here: > > > > http://cassandra.apache.org/doc/latest/development/ > > > > testing.html#performance-testing > > > > > > > > > > > > "Performance tests for Cassandra are a special breed of tests that are > > not > > > > part of the usual patch contribution process. In fact you can > > contribute > > > > tons of patches to Cassandra without ever running performance tests. > > They > > > > are important however when working on performance improvements, as such > > > > improvements must be measurable." > > > > > > > > I think performance numbers aren't just important, but should be a > > > > *requirement* to merge a performance patch. > > > > > > > > Thoughts? > > > > Jon > > > > > >
Re: committing performance patches without performance numbers
I don't expect everyone to run a 500 node cluster off to the side to test their patches, but at least some indication that the contributor started Cassandra on their laptop would be a good sign. The JIRA I referenced was an optimization around List, Set and Map serialization. Would it really have been that crazy to run a handful of benchmarks locally and post those results? On Thu, Mar 9, 2017 at 12:26 PM Ariel Weisbergwrote: > Hi, > > I think there are issues around the availability of hardware sufficient > to demonstrate the performance concerns under test. It's an open source > project without centralized infrastructure. A lot of performance > contributions come from people running C* in production. They are > already running these changes and have seen the improvement, but > communicating that to the outside world in a convincing way can be hard. > > Right now we don't even have performance in continuous integration. I > think we are putting the cart before the horse in that respect. What > about all the commits that don't intend to have a performance impact but > do? Even if we had performance metrics in CI who is going to triage the > results religiously? > > We also to my knowledge don't have benchmarks for key functionality in > cassandra-stress. Can cassandra-stress benchmark CAS? My recollection is > every time I looked is that it wasn't there. > > We can only set the bar as high as contributors are able to meet. > Certainly if they can't justify why they can't benchmark the thing they > want to contribute then reviewers should make them go and benchmark it. > > Regards, > Ariel > > On Thu, Mar 9, 2017, at 03:11 PM, Jeff Jirsa wrote: > > Agree. Anything that's meant to increase performance should demonstrate > > it > > actually does that. We have microbench available in recent versions - > > writing a new microbenchmark isn't all that onerous. Would be great if we > > had perf tests included in the normal testall/dtest workflow for ALL > > patches so we could quickly spot regressions, but that gets pretty > > expensive in terms of running long enough tests to actually see most > > common > > code paths. > > > > > > On Thu, Mar 9, 2017 at 12:00 PM, Jonathan Haddad > > wrote: > > > > > I'd like to discuss what I consider to be a pretty important matter - > > > patches which are written for the sole purpose of improving performance > > > without including a single performance benchmark in the JIRA. > > > > > > My original email was in "Testing and Jira Tickets", i'll copy it here > > > for posterity: > > > > > > If you don't mind, I'd like to broaden the discussion a little bit to > also > > > discuss performance related patches. For instance, CASSANDRA-13271 > was a > > > performance / optimization related patch that included *zero* > information > > > on if there was any perf improvement or a regression as a result of the > > > change, even though I've asked twice for that information. > > > > > > In addition to "does this thing break anything" we should be asking > "how > > > does this patch affect performance?" (and were the appropriate docs > > > included, but that's another topic altogether) > > > > > > There's a minor note about perf related stuff here: > > > http://cassandra.apache.org/doc/latest/development/ > > > testing.html#performance-testing > > > > > > > > > "Performance tests for Cassandra are a special breed of tests that are > not > > > part of the usual patch contribution process. In fact you can > contribute > > > tons of patches to Cassandra without ever running performance tests. > They > > > are important however when working on performance improvements, as such > > > improvements must be measurable." > > > > > > I think performance numbers aren't just important, but should be a > > > *requirement* to merge a performance patch. > > > > > > Thoughts? > > > Jon > > > >
Re: committing performance patches without performance numbers
Hi, I think there are issues around the availability of hardware sufficient to demonstrate the performance concerns under test. It's an open source project without centralized infrastructure. A lot of performance contributions come from people running C* in production. They are already running these changes and have seen the improvement, but communicating that to the outside world in a convincing way can be hard. Right now we don't even have performance in continuous integration. I think we are putting the cart before the horse in that respect. What about all the commits that don't intend to have a performance impact but do? Even if we had performance metrics in CI who is going to triage the results religiously? We also to my knowledge don't have benchmarks for key functionality in cassandra-stress. Can cassandra-stress benchmark CAS? My recollection is every time I looked is that it wasn't there. We can only set the bar as high as contributors are able to meet. Certainly if they can't justify why they can't benchmark the thing they want to contribute then reviewers should make them go and benchmark it. Regards, Ariel On Thu, Mar 9, 2017, at 03:11 PM, Jeff Jirsa wrote: > Agree. Anything that's meant to increase performance should demonstrate > it > actually does that. We have microbench available in recent versions - > writing a new microbenchmark isn't all that onerous. Would be great if we > had perf tests included in the normal testall/dtest workflow for ALL > patches so we could quickly spot regressions, but that gets pretty > expensive in terms of running long enough tests to actually see most > common > code paths. > > > On Thu, Mar 9, 2017 at 12:00 PM, Jonathan Haddad> wrote: > > > I'd like to discuss what I consider to be a pretty important matter - > > patches which are written for the sole purpose of improving performance > > without including a single performance benchmark in the JIRA. > > > > My original email was in "Testing and Jira Tickets", i'll copy it here > > for posterity: > > > > If you don't mind, I'd like to broaden the discussion a little bit to also > > discuss performance related patches. For instance, CASSANDRA-13271 was a > > performance / optimization related patch that included *zero* information > > on if there was any perf improvement or a regression as a result of the > > change, even though I've asked twice for that information. > > > > In addition to "does this thing break anything" we should be asking "how > > does this patch affect performance?" (and were the appropriate docs > > included, but that's another topic altogether) > > > > There's a minor note about perf related stuff here: > > http://cassandra.apache.org/doc/latest/development/ > > testing.html#performance-testing > > > > > > "Performance tests for Cassandra are a special breed of tests that are not > > part of the usual patch contribution process. In fact you can contribute > > tons of patches to Cassandra without ever running performance tests. They > > are important however when working on performance improvements, as such > > improvements must be measurable." > > > > I think performance numbers aren't just important, but should be a > > *requirement* to merge a performance patch. > > > > Thoughts? > > Jon > >
Re: committing performance patches without performance numbers
After looking at the patch, my thoughts (beware, it's getting very technical): Original code: -t = new ListType(elements, isMultiCell); -ListType t2 = internMap.putIfAbsent(elements, t); -t = (t2 == null) ? t : t2; Optimized code: +t = internMap.computeIfAbsent(elements, K -> new ListType<>(K, isMultiCell) ); This is a very obvious and *classical* optimization. In the original code, we create an instance of ListType() but it is ONLY usefull if the interMap doesn't have any value (putIfAbsent). Meaning that the creation of the ListType object is useless if the map already contains a value. The optimized code use the computeIfAbsent with a lambda (K -> new ListType...) which will be evaluated iff the map doesn't contains the key For a high performance project like Cassandra, avoiding to create un-necessary objects to help keeping the heap size small is always a good idea. And this kind of optimization using lambda for lazy evaluation is quite common (see a PR here on my Achilles project: https://github.com/doanduyhai/Achilles/commit/9b9c09aa26d0edb8c58222b1ad069327b6f40da3 ) My guess is that the perf improvement and the gain is so obvious for Java people that the JIRA creator thought a perf benchmark is not necessary On Thu, Mar 9, 2017 at 9:00 PM, Jonathan Haddadwrote: > I'd like to discuss what I consider to be a pretty important matter - > patches which are written for the sole purpose of improving performance > without including a single performance benchmark in the JIRA. > > My original email was in "Testing and Jira Tickets", i'll copy it here > for posterity: > > If you don't mind, I'd like to broaden the discussion a little bit to also > discuss performance related patches. For instance, CASSANDRA-13271 was a > performance / optimization related patch that included *zero* information > on if there was any perf improvement or a regression as a result of the > change, even though I've asked twice for that information. > > In addition to "does this thing break anything" we should be asking "how > does this patch affect performance?" (and were the appropriate docs > included, but that's another topic altogether) > > There's a minor note about perf related stuff here: > http://cassandra.apache.org/doc/latest/development/ > testing.html#performance-testing > > > "Performance tests for Cassandra are a special breed of tests that are not > part of the usual patch contribution process. In fact you can contribute > tons of patches to Cassandra without ever running performance tests. They > are important however when working on performance improvements, as such > improvements must be measurable." > > I think performance numbers aren't just important, but should be a > *requirement* to merge a performance patch. > > Thoughts? > Jon >
Re: committing performance patches without performance numbers
Agree. Anything that's meant to increase performance should demonstrate it actually does that. We have microbench available in recent versions - writing a new microbenchmark isn't all that onerous. Would be great if we had perf tests included in the normal testall/dtest workflow for ALL patches so we could quickly spot regressions, but that gets pretty expensive in terms of running long enough tests to actually see most common code paths. On Thu, Mar 9, 2017 at 12:00 PM, Jonathan Haddadwrote: > I'd like to discuss what I consider to be a pretty important matter - > patches which are written for the sole purpose of improving performance > without including a single performance benchmark in the JIRA. > > My original email was in "Testing and Jira Tickets", i'll copy it here > for posterity: > > If you don't mind, I'd like to broaden the discussion a little bit to also > discuss performance related patches. For instance, CASSANDRA-13271 was a > performance / optimization related patch that included *zero* information > on if there was any perf improvement or a regression as a result of the > change, even though I've asked twice for that information. > > In addition to "does this thing break anything" we should be asking "how > does this patch affect performance?" (and were the appropriate docs > included, but that's another topic altogether) > > There's a minor note about perf related stuff here: > http://cassandra.apache.org/doc/latest/development/ > testing.html#performance-testing > > > "Performance tests for Cassandra are a special breed of tests that are not > part of the usual patch contribution process. In fact you can contribute > tons of patches to Cassandra without ever running performance tests. They > are important however when working on performance improvements, as such > improvements must be measurable." > > I think performance numbers aren't just important, but should be a > *requirement* to merge a performance patch. > > Thoughts? > Jon >