Re: committing performance patches without performance numbers

2017-03-10 Thread Josh McKenzie
The elephant in the room to me is what Ariel said:
>
> What about all the commits that don't intend to have a performance impact
> but do?


On top of that, it's all well and good to microbench the change you make in
a vacuum but what happens when the resource usage implications of that
change have side-effects on the rest of the system?

I've always been a vocal proponent of proving your improvement w/numbers
(be they microbenchmark or stress-based single node, etc), so I'm strongly
+1 on us improving our hygiene on this front.

On Thu, Mar 9, 2017 at 4:56 PM, Ariel Weisberg  wrote:

> Hi,
>
> I should clarify. Should in the sense of it was fine for the process we
> had at the time (ugh!) but it's not what we should do in the future.
>
> Ariel
>
> On Thu, Mar 9, 2017, at 04:55 PM, Ariel Weisberg wrote:
> > Hi,
> >
> > I agree that patch could have used it and it was amenable to
> > micro-benchmarking. Just to be pedantic about process which is something
> > I approve of to the chagrin of so many.
> >
> > On a completely related note that change also randomly boxes a boolean.
> >
> > Ariel
> >
> > On Thu, Mar 9, 2017, at 03:45 PM, Jonathan Haddad wrote:
> > > I don't expect everyone to run a 500 node cluster off to the side to
> test
> > > their patches, but at least some indication that the contributor
> started
> > > Cassandra on their laptop would be a good sign.  The JIRA I referenced
> > > was
> > > an optimization around List, Set and Map serialization.  Would it
> really
> > > have been that crazy to run a handful of benchmarks locally and post
> > > those
> > > results?
> > >
> > > On Thu, Mar 9, 2017 at 12:26 PM Ariel Weisberg 
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I think there are issues around the availability of hardware
> sufficient
> > > > to demonstrate the performance concerns under test. It's an open
> source
> > > > project without centralized infrastructure. A lot of performance
> > > > contributions come from people running C* in production. They are
> > > > already running these changes and have seen the improvement, but
> > > > communicating that to the outside world in a convincing way can be
> hard.
> > > >
> > > > Right now we don't even have performance in continuous integration. I
> > > > think we are putting the cart before the horse in that respect. What
> > > > about all the commits that don't intend to have a performance impact
> but
> > > > do? Even if we had performance metrics in CI who is going to triage
> the
> > > > results religiously?
> > > >
> > > > We also to my knowledge don't have benchmarks for key functionality
> in
> > > > cassandra-stress. Can cassandra-stress benchmark CAS? My
> recollection is
> > > > every time I looked is that it wasn't there.
> > > >
> > > > We can only set the bar as high as contributors are able to meet.
> > > > Certainly if they can't justify why they can't benchmark the thing
> they
> > > > want to contribute then reviewers should make them go and benchmark
> it.
> > > >
> > > > Regards,
> > > > Ariel
> > > >
> > > > On Thu, Mar 9, 2017, at 03:11 PM, Jeff Jirsa wrote:
> > > > > Agree. Anything that's meant to increase performance should
> demonstrate
> > > > > it
> > > > > actually does that. We have microbench available in recent
> versions -
> > > > > writing a new microbenchmark isn't all that onerous. Would be
> great if we
> > > > > had perf tests included in the normal testall/dtest workflow for
> ALL
> > > > > patches so we could quickly spot regressions, but that gets pretty
> > > > > expensive in terms of running long enough tests to actually see
> most
> > > > > common
> > > > > code paths.
> > > > >
> > > > >
> > > > > On Thu, Mar 9, 2017 at 12:00 PM, Jonathan Haddad <
> j...@jonhaddad.com>
> > > > > wrote:
> > > > >
> > > > > > I'd like to discuss what I consider to be a pretty important
> matter -
> > > > > > patches which are written for the sole purpose of improving
> performance
> > > > > > without including a single performance benchmark in the JIRA.
> > > > > >
> > > > > > My original email was in "Testing and Jira Tickets", i'll copy
> it here
> > > > > > for posterity:
> > > > > >
> > > > > > If you don't mind, I'd like to broaden the discussion a little
> bit to
> > > > also
> > > > > > discuss performance related patches.  For instance,
> CASSANDRA-13271
> > > > was a
> > > > > > performance / optimization related patch that included *zero*
> > > > information
> > > > > > on if there was any perf improvement or a regression as a result
> of the
> > > > > > change, even though I've asked twice for that information.
> > > > > >
> > > > > > In addition to "does this thing break anything" we should be
> asking
> > > > "how
> > > > > > does this patch affect performance?" (and were the appropriate
> docs
> > > > > > included, but that's another topic altogether)
> > > > > >
> > > > > > There's a minor note about perf related stuff here:
> > > > > > 

Re: committing performance patches without performance numbers

2017-03-09 Thread Ariel Weisberg
Hi,

I should clarify. Should in the sense of it was fine for the process we
had at the time (ugh!) but it's not what we should do in the future.

Ariel

On Thu, Mar 9, 2017, at 04:55 PM, Ariel Weisberg wrote:
> Hi,
> 
> I agree that patch could have used it and it was amenable to
> micro-benchmarking. Just to be pedantic about process which is something
> I approve of to the chagrin of so many.
> 
> On a completely related note that change also randomly boxes a boolean.
> 
> Ariel
> 
> On Thu, Mar 9, 2017, at 03:45 PM, Jonathan Haddad wrote:
> > I don't expect everyone to run a 500 node cluster off to the side to test
> > their patches, but at least some indication that the contributor started
> > Cassandra on their laptop would be a good sign.  The JIRA I referenced
> > was
> > an optimization around List, Set and Map serialization.  Would it really
> > have been that crazy to run a handful of benchmarks locally and post
> > those
> > results?
> > 
> > On Thu, Mar 9, 2017 at 12:26 PM Ariel Weisberg  wrote:
> > 
> > > Hi,
> > >
> > > I think there are issues around the availability of hardware sufficient
> > > to demonstrate the performance concerns under test. It's an open source
> > > project without centralized infrastructure. A lot of performance
> > > contributions come from people running C* in production. They are
> > > already running these changes and have seen the improvement, but
> > > communicating that to the outside world in a convincing way can be hard.
> > >
> > > Right now we don't even have performance in continuous integration. I
> > > think we are putting the cart before the horse in that respect. What
> > > about all the commits that don't intend to have a performance impact but
> > > do? Even if we had performance metrics in CI who is going to triage the
> > > results religiously?
> > >
> > > We also to my knowledge don't have benchmarks for key functionality in
> > > cassandra-stress. Can cassandra-stress benchmark CAS? My recollection is
> > > every time I looked is that it wasn't there.
> > >
> > > We can only set the bar as high as contributors are able to meet.
> > > Certainly if they can't justify why they can't benchmark the thing they
> > > want to contribute then reviewers should make them go and benchmark it.
> > >
> > > Regards,
> > > Ariel
> > >
> > > On Thu, Mar 9, 2017, at 03:11 PM, Jeff Jirsa wrote:
> > > > Agree. Anything that's meant to increase performance should demonstrate
> > > > it
> > > > actually does that. We have microbench available in recent versions -
> > > > writing a new microbenchmark isn't all that onerous. Would be great if 
> > > > we
> > > > had perf tests included in the normal testall/dtest workflow for ALL
> > > > patches so we could quickly spot regressions, but that gets pretty
> > > > expensive in terms of running long enough tests to actually see most
> > > > common
> > > > code paths.
> > > >
> > > >
> > > > On Thu, Mar 9, 2017 at 12:00 PM, Jonathan Haddad 
> > > > wrote:
> > > >
> > > > > I'd like to discuss what I consider to be a pretty important matter -
> > > > > patches which are written for the sole purpose of improving 
> > > > > performance
> > > > > without including a single performance benchmark in the JIRA.
> > > > >
> > > > > My original email was in "Testing and Jira Tickets", i'll copy it here
> > > > > for posterity:
> > > > >
> > > > > If you don't mind, I'd like to broaden the discussion a little bit to
> > > also
> > > > > discuss performance related patches.  For instance, CASSANDRA-13271
> > > was a
> > > > > performance / optimization related patch that included *zero*
> > > information
> > > > > on if there was any perf improvement or a regression as a result of 
> > > > > the
> > > > > change, even though I've asked twice for that information.
> > > > >
> > > > > In addition to "does this thing break anything" we should be asking
> > > "how
> > > > > does this patch affect performance?" (and were the appropriate docs
> > > > > included, but that's another topic altogether)
> > > > >
> > > > > There's a minor note about perf related stuff here:
> > > > > http://cassandra.apache.org/doc/latest/development/
> > > > > testing.html#performance-testing
> > > > >
> > > > >
> > > > > "Performance tests for Cassandra are a special breed of tests that are
> > > not
> > > > > part of the usual patch contribution process. In fact you can
> > > contribute
> > > > > tons of patches to Cassandra without ever running performance tests.
> > > They
> > > > > are important however when working on performance improvements, as 
> > > > > such
> > > > > improvements must be measurable."
> > > > >
> > > > > I think performance numbers aren't just important, but should be a
> > > > > *requirement* to merge a performance patch.
> > > > >
> > > > > Thoughts?
> > > > > Jon
> > > > >
> > >


Re: committing performance patches without performance numbers

2017-03-09 Thread Ariel Weisberg
Hi,

I agree that patch could have used it and it was amenable to
micro-benchmarking. Just to be pedantic about process which is something
I approve of to the chagrin of so many.

On a completely related note that change also randomly boxes a boolean.

Ariel

On Thu, Mar 9, 2017, at 03:45 PM, Jonathan Haddad wrote:
> I don't expect everyone to run a 500 node cluster off to the side to test
> their patches, but at least some indication that the contributor started
> Cassandra on their laptop would be a good sign.  The JIRA I referenced
> was
> an optimization around List, Set and Map serialization.  Would it really
> have been that crazy to run a handful of benchmarks locally and post
> those
> results?
> 
> On Thu, Mar 9, 2017 at 12:26 PM Ariel Weisberg  wrote:
> 
> > Hi,
> >
> > I think there are issues around the availability of hardware sufficient
> > to demonstrate the performance concerns under test. It's an open source
> > project without centralized infrastructure. A lot of performance
> > contributions come from people running C* in production. They are
> > already running these changes and have seen the improvement, but
> > communicating that to the outside world in a convincing way can be hard.
> >
> > Right now we don't even have performance in continuous integration. I
> > think we are putting the cart before the horse in that respect. What
> > about all the commits that don't intend to have a performance impact but
> > do? Even if we had performance metrics in CI who is going to triage the
> > results religiously?
> >
> > We also to my knowledge don't have benchmarks for key functionality in
> > cassandra-stress. Can cassandra-stress benchmark CAS? My recollection is
> > every time I looked is that it wasn't there.
> >
> > We can only set the bar as high as contributors are able to meet.
> > Certainly if they can't justify why they can't benchmark the thing they
> > want to contribute then reviewers should make them go and benchmark it.
> >
> > Regards,
> > Ariel
> >
> > On Thu, Mar 9, 2017, at 03:11 PM, Jeff Jirsa wrote:
> > > Agree. Anything that's meant to increase performance should demonstrate
> > > it
> > > actually does that. We have microbench available in recent versions -
> > > writing a new microbenchmark isn't all that onerous. Would be great if we
> > > had perf tests included in the normal testall/dtest workflow for ALL
> > > patches so we could quickly spot regressions, but that gets pretty
> > > expensive in terms of running long enough tests to actually see most
> > > common
> > > code paths.
> > >
> > >
> > > On Thu, Mar 9, 2017 at 12:00 PM, Jonathan Haddad 
> > > wrote:
> > >
> > > > I'd like to discuss what I consider to be a pretty important matter -
> > > > patches which are written for the sole purpose of improving performance
> > > > without including a single performance benchmark in the JIRA.
> > > >
> > > > My original email was in "Testing and Jira Tickets", i'll copy it here
> > > > for posterity:
> > > >
> > > > If you don't mind, I'd like to broaden the discussion a little bit to
> > also
> > > > discuss performance related patches.  For instance, CASSANDRA-13271
> > was a
> > > > performance / optimization related patch that included *zero*
> > information
> > > > on if there was any perf improvement or a regression as a result of the
> > > > change, even though I've asked twice for that information.
> > > >
> > > > In addition to "does this thing break anything" we should be asking
> > "how
> > > > does this patch affect performance?" (and were the appropriate docs
> > > > included, but that's another topic altogether)
> > > >
> > > > There's a minor note about perf related stuff here:
> > > > http://cassandra.apache.org/doc/latest/development/
> > > > testing.html#performance-testing
> > > >
> > > >
> > > > "Performance tests for Cassandra are a special breed of tests that are
> > not
> > > > part of the usual patch contribution process. In fact you can
> > contribute
> > > > tons of patches to Cassandra without ever running performance tests.
> > They
> > > > are important however when working on performance improvements, as such
> > > > improvements must be measurable."
> > > >
> > > > I think performance numbers aren't just important, but should be a
> > > > *requirement* to merge a performance patch.
> > > >
> > > > Thoughts?
> > > > Jon
> > > >
> >


Re: committing performance patches without performance numbers

2017-03-09 Thread Jonathan Haddad
I don't expect everyone to run a 500 node cluster off to the side to test
their patches, but at least some indication that the contributor started
Cassandra on their laptop would be a good sign.  The JIRA I referenced was
an optimization around List, Set and Map serialization.  Would it really
have been that crazy to run a handful of benchmarks locally and post those
results?

On Thu, Mar 9, 2017 at 12:26 PM Ariel Weisberg  wrote:

> Hi,
>
> I think there are issues around the availability of hardware sufficient
> to demonstrate the performance concerns under test. It's an open source
> project without centralized infrastructure. A lot of performance
> contributions come from people running C* in production. They are
> already running these changes and have seen the improvement, but
> communicating that to the outside world in a convincing way can be hard.
>
> Right now we don't even have performance in continuous integration. I
> think we are putting the cart before the horse in that respect. What
> about all the commits that don't intend to have a performance impact but
> do? Even if we had performance metrics in CI who is going to triage the
> results religiously?
>
> We also to my knowledge don't have benchmarks for key functionality in
> cassandra-stress. Can cassandra-stress benchmark CAS? My recollection is
> every time I looked is that it wasn't there.
>
> We can only set the bar as high as contributors are able to meet.
> Certainly if they can't justify why they can't benchmark the thing they
> want to contribute then reviewers should make them go and benchmark it.
>
> Regards,
> Ariel
>
> On Thu, Mar 9, 2017, at 03:11 PM, Jeff Jirsa wrote:
> > Agree. Anything that's meant to increase performance should demonstrate
> > it
> > actually does that. We have microbench available in recent versions -
> > writing a new microbenchmark isn't all that onerous. Would be great if we
> > had perf tests included in the normal testall/dtest workflow for ALL
> > patches so we could quickly spot regressions, but that gets pretty
> > expensive in terms of running long enough tests to actually see most
> > common
> > code paths.
> >
> >
> > On Thu, Mar 9, 2017 at 12:00 PM, Jonathan Haddad 
> > wrote:
> >
> > > I'd like to discuss what I consider to be a pretty important matter -
> > > patches which are written for the sole purpose of improving performance
> > > without including a single performance benchmark in the JIRA.
> > >
> > > My original email was in "Testing and Jira Tickets", i'll copy it here
> > > for posterity:
> > >
> > > If you don't mind, I'd like to broaden the discussion a little bit to
> also
> > > discuss performance related patches.  For instance, CASSANDRA-13271
> was a
> > > performance / optimization related patch that included *zero*
> information
> > > on if there was any perf improvement or a regression as a result of the
> > > change, even though I've asked twice for that information.
> > >
> > > In addition to "does this thing break anything" we should be asking
> "how
> > > does this patch affect performance?" (and were the appropriate docs
> > > included, but that's another topic altogether)
> > >
> > > There's a minor note about perf related stuff here:
> > > http://cassandra.apache.org/doc/latest/development/
> > > testing.html#performance-testing
> > >
> > >
> > > "Performance tests for Cassandra are a special breed of tests that are
> not
> > > part of the usual patch contribution process. In fact you can
> contribute
> > > tons of patches to Cassandra without ever running performance tests.
> They
> > > are important however when working on performance improvements, as such
> > > improvements must be measurable."
> > >
> > > I think performance numbers aren't just important, but should be a
> > > *requirement* to merge a performance patch.
> > >
> > > Thoughts?
> > > Jon
> > >
>


Re: committing performance patches without performance numbers

2017-03-09 Thread Ariel Weisberg
Hi,

I think there are issues around the availability of hardware sufficient
to demonstrate the performance concerns under test. It's an open source
project without centralized infrastructure. A lot of performance
contributions come from people running C* in production. They are
already running these changes and have seen the improvement, but
communicating that to the outside world in a convincing way can be hard.

Right now we don't even have performance in continuous integration. I
think we are putting the cart before the horse in that respect. What
about all the commits that don't intend to have a performance impact but
do? Even if we had performance metrics in CI who is going to triage the
results religiously?

We also to my knowledge don't have benchmarks for key functionality in
cassandra-stress. Can cassandra-stress benchmark CAS? My recollection is
every time I looked is that it wasn't there.

We can only set the bar as high as contributors are able to meet.
Certainly if they can't justify why they can't benchmark the thing they
want to contribute then reviewers should make them go and benchmark it.

Regards,
Ariel

On Thu, Mar 9, 2017, at 03:11 PM, Jeff Jirsa wrote:
> Agree. Anything that's meant to increase performance should demonstrate
> it
> actually does that. We have microbench available in recent versions -
> writing a new microbenchmark isn't all that onerous. Would be great if we
> had perf tests included in the normal testall/dtest workflow for ALL
> patches so we could quickly spot regressions, but that gets pretty
> expensive in terms of running long enough tests to actually see most
> common
> code paths.
> 
> 
> On Thu, Mar 9, 2017 at 12:00 PM, Jonathan Haddad 
> wrote:
> 
> > I'd like to discuss what I consider to be a pretty important matter -
> > patches which are written for the sole purpose of improving performance
> > without including a single performance benchmark in the JIRA.
> >
> > My original email was in "Testing and Jira Tickets", i'll copy it here
> > for posterity:
> >
> > If you don't mind, I'd like to broaden the discussion a little bit to also
> > discuss performance related patches.  For instance, CASSANDRA-13271 was a
> > performance / optimization related patch that included *zero* information
> > on if there was any perf improvement or a regression as a result of the
> > change, even though I've asked twice for that information.
> >
> > In addition to "does this thing break anything" we should be asking "how
> > does this patch affect performance?" (and were the appropriate docs
> > included, but that's another topic altogether)
> >
> > There's a minor note about perf related stuff here:
> > http://cassandra.apache.org/doc/latest/development/
> > testing.html#performance-testing
> >
> >
> > "Performance tests for Cassandra are a special breed of tests that are not
> > part of the usual patch contribution process. In fact you can contribute
> > tons of patches to Cassandra without ever running performance tests. They
> > are important however when working on performance improvements, as such
> > improvements must be measurable."
> >
> > I think performance numbers aren't just important, but should be a
> > *requirement* to merge a performance patch.
> >
> > Thoughts?
> > Jon
> >


Re: committing performance patches without performance numbers

2017-03-09 Thread DuyHai Doan
After looking at the patch, my thoughts (beware, it's getting very
technical):


Original code:

-t = new ListType(elements, isMultiCell);
-ListType t2 = internMap.putIfAbsent(elements, t);
-t = (t2 == null) ? t : t2;


Optimized code:

+t = internMap.computeIfAbsent(elements, K -> new
ListType<>(K, isMultiCell) );


This is a very obvious and *classical* optimization. In the original code,
we create an instance of ListType() but it is ONLY usefull if the interMap
doesn't have any value (putIfAbsent). Meaning that the creation of the
ListType object is useless if the map already contains a value.

The optimized code use the computeIfAbsent with a lambda (K -> new
ListType...) which will be evaluated iff the map doesn't contains the key

 For a high performance project like Cassandra, avoiding to create
un-necessary objects to help keeping the heap size small is always a good
idea.

 And this kind of optimization using lambda for lazy evaluation is quite
common (see a PR here on my Achilles project:
https://github.com/doanduyhai/Achilles/commit/9b9c09aa26d0edb8c58222b1ad069327b6f40da3
)

My guess is that the perf improvement and the gain is so obvious for Java
people that the JIRA creator thought a perf benchmark is not necessary

On Thu, Mar 9, 2017 at 9:00 PM, Jonathan Haddad  wrote:

> I'd like to discuss what I consider to be a pretty important matter -
> patches which are written for the sole purpose of improving performance
> without including a single performance benchmark in the JIRA.
>
> My original email was in "Testing and Jira Tickets", i'll copy it here
> for posterity:
>
> If you don't mind, I'd like to broaden the discussion a little bit to also
> discuss performance related patches.  For instance, CASSANDRA-13271 was a
> performance / optimization related patch that included *zero* information
> on if there was any perf improvement or a regression as a result of the
> change, even though I've asked twice for that information.
>
> In addition to "does this thing break anything" we should be asking "how
> does this patch affect performance?" (and were the appropriate docs
> included, but that's another topic altogether)
>
> There's a minor note about perf related stuff here:
> http://cassandra.apache.org/doc/latest/development/
> testing.html#performance-testing
>
>
> "Performance tests for Cassandra are a special breed of tests that are not
> part of the usual patch contribution process. In fact you can contribute
> tons of patches to Cassandra without ever running performance tests. They
> are important however when working on performance improvements, as such
> improvements must be measurable."
>
> I think performance numbers aren't just important, but should be a
> *requirement* to merge a performance patch.
>
> Thoughts?
> Jon
>


Re: committing performance patches without performance numbers

2017-03-09 Thread Jeff Jirsa
Agree. Anything that's meant to increase performance should demonstrate it
actually does that. We have microbench available in recent versions -
writing a new microbenchmark isn't all that onerous. Would be great if we
had perf tests included in the normal testall/dtest workflow for ALL
patches so we could quickly spot regressions, but that gets pretty
expensive in terms of running long enough tests to actually see most common
code paths.


On Thu, Mar 9, 2017 at 12:00 PM, Jonathan Haddad  wrote:

> I'd like to discuss what I consider to be a pretty important matter -
> patches which are written for the sole purpose of improving performance
> without including a single performance benchmark in the JIRA.
>
> My original email was in "Testing and Jira Tickets", i'll copy it here
> for posterity:
>
> If you don't mind, I'd like to broaden the discussion a little bit to also
> discuss performance related patches.  For instance, CASSANDRA-13271 was a
> performance / optimization related patch that included *zero* information
> on if there was any perf improvement or a regression as a result of the
> change, even though I've asked twice for that information.
>
> In addition to "does this thing break anything" we should be asking "how
> does this patch affect performance?" (and were the appropriate docs
> included, but that's another topic altogether)
>
> There's a minor note about perf related stuff here:
> http://cassandra.apache.org/doc/latest/development/
> testing.html#performance-testing
>
>
> "Performance tests for Cassandra are a special breed of tests that are not
> part of the usual patch contribution process. In fact you can contribute
> tons of patches to Cassandra without ever running performance tests. They
> are important however when working on performance improvements, as such
> improvements must be measurable."
>
> I think performance numbers aren't just important, but should be a
> *requirement* to merge a performance patch.
>
> Thoughts?
> Jon
>