Re: Performance testing is necessary now

Jan Høydahl Wed, 12 Aug 2020 05:56:45 -0700

I was glad to read your last mail with a softer tone, Ishan. Respect! I really 
appreciate all your hard work and passion for Lucene/Solr, and thanks for 
putting time into benchmarking, it really shows that you care about the 
project! What I think we need more than anything else going forward (my self 
included) is lowering the friction level in some of the conversaions, 
interpreting others with the best intent in mind. We’ll all benefit if we 
manage to make Lucene/Solr once again a truly welcoming and warm community, 
while maintaining high professional standards of course.


I think we could benefit from more online committer meetings as well, the one 
we had back in March was great, and a good place to discuss project matters 
like benchmarking or big refactorings.

Jan

> 12. aug. 2020 kl. 14:32 skrev Ishan Chattopadhyaya 
> <ichattopadhy...@gmail.com>:
> 
> Hi All,
> I went through all the concerns voiced above and took a step back and 
> re-assessed my position. I am withdrawing my initial proposal to 
> request/nag/demand/veto issues without a performance test.
> I shall not insist on that and apologize for using such language as I did 
> above. I hope, though, that we all do our best to preserve (and only improve) 
> the performance characteristics of Solr for the sake of our users.
> Thanks to everyone for your inputs.
> Regards,
> Ishan
> 
> On Wed, Aug 12, 2020 at 10:36 AM Ishan Chattopadhyaya 
> <ichattopadhy...@gmail.com <mailto:ichattopadhy...@gmail.com>> wrote:
> > Maybe if we have a common benchmarking suite, such efforts will be less 
> > effort and can actually be contributed back so that we can potentially 
> > monitor the matter.
> 
> I am +1 to contributing this to an Apache repository, the moment this is 
> stable. The moment periodic numbers start getting published, the risk of the 
> suite being abandoned is reduced. Two more things to do before this happens: 
> 1. identifying datasets and queries (I'm making progress) and 2. a web UI 
> that plots charts based on those numbers. Help welcome.
> 
> > Whatever we do or not do is imperfect.  I hope some "mandate" doesn't stop 
> > progress. 
> > We don't go changing code just for the heck of it; we do it for a variety 
> > of matters.
> 
> We sometimes do: https://issues.apache.org/jira/browse/SOLR-12845 
> <https://issues.apache.org/jira/browse/SOLR-12845>. I don't want to stop 
> progress, but I want to avoid situations where someone commits an issue (e.g. 
> SOLR-12845), it causes a massive regression (SOLR-14665), and others have to 
> come and fix the situation (https://issues.apache.org/jira/browse/SOLR-14706 
> <https://issues.apache.org/jira/browse/SOLR-14706> and releases) with very 
> little help or support from the original committer. Just because there was no 
> mandate in place, hours and hours of effort has already been wasted on that 
> issue, let aside the users who are suffering as well. 
> 
> Requesting a performance testing for all features affecting critical code 
> paths seemed like the most constructive way to tackle this situation, but if 
> there is any other solution that comes to mind to address this situation, 
> please suggest.
> 
> >  If those
> > things are blocked, we'll be trading the opportunity cost of the change for 
> > the performance 
> > risk.  Each issue is different -- has its own risk-reward trade-off.  Just 
> > keep this in mind, Ishan.
> 
> I totally understand.
> 
> On Wed, Aug 12, 2020 at 10:18 AM Ishan Chattopadhyaya 
> <ichattopadhy...@gmail.com <mailto:ichattopadhy...@gmail.com>> wrote:
> > I don't think that the problem is nobody cares, more likely the problem is 
> > it's hard and there's always a tug of war between getting things done and 
> > out there where people can benefit from the feature/fix etc vs the risk 
> > that they stall out waiting for one more thing to do.
> I have tried desperately to stay constructive in this effort and in my 
> intention, so I will not repeat what I have said in the past.
> 
> > If the time to complete a task grows the likelihood that real life, and 
> > jobs interrupt it grows, and the chance it lingers indefinitely or is 
> > abandoned goes up.
> I'm afraid that shouldn't be an excuse to not do the due diligence. It is 
> better to not commit something that is not performance tested (and affects 
> default code paths for every user) than to commit it, cause a regression and 
> have other people come clean up the performance mess after you.
> 
> 
> 
> On Wed, Aug 12, 2020 at 10:03 AM Ishan Chattopadhyaya 
> <ichattopadhy...@gmail.com <mailto:ichattopadhy...@gmail.com>> wrote:
> > I was going to use the data set the Mike uses for the lucene nightly 
> > benchmarks
> I've gone with the same in the suite to begin with: 
> https://github.com/TheSearchStack/solr-bench/blob/master/small-data/small-enwiki.tsv.gz
>  
> <https://github.com/TheSearchStack/solr-bench/blob/master/small-data/small-enwiki.tsv.gz>
> The larger file can be downloaded and used as well.
> 
> The suite is also capable of using .jsonl files, and I'm building another 
> dataset (based on Hacker News articles) for that at the moment.
> 
> On Wed, Aug 12, 2020 at 10:00 AM Ishan Chattopadhyaya 
> <ichattopadhy...@gmail.com <mailto:ichattopadhy...@gmail.com>> wrote:
> Here's the local mode example: 
> https://github.com/TheSearchStack/solr-bench/blob/master/config-local.json 
> <https://github.com/TheSearchStack/solr-bench/blob/master/config-local.json>
> (Here, please ignore the JDK URL, it is downloaded but the system JDK is used)
> 
> A pre-built Solr can be used as per 
> https://github.com/TheSearchStack/solr-bench/blob/master/config-prebuilt.json 
> <https://github.com/TheSearchStack/solr-bench/blob/master/config-prebuilt.json>
>  (I just added this).
> In this example, Solr is downloaded from the given URL and used. 
> Alternatively, you can build Solr a tarball and place it in the solr-bench 
> directory and specify its name (not the full path) in the "solr-package".
> 
> When both "solr-package" and "repository" are specified, the former is used 
> and the latter is ignored. If only the latter is specified ("repository"), 
> Solr is compiled/built using the specified commit point.
> 
> 
> 
> 
> On Wed, Aug 12, 2020 at 6:17 AM Mike Drob <md...@apache.org 
> <mailto:md...@apache.org>> wrote:
> Can you give examples of this? I don’t see them in the repo. 
> 
> On Tue, Aug 11, 2020 at 4:30 PM Ishan Chattopadhyaya 
> <ichattopadhy...@gmail.com <mailto:ichattopadhy...@gmail.com>> wrote:
> Local mode uses the installed JDK. GCP mode can pick up a JDK url as 
> configured. It is just a configuration, one among many, that can be changed 
> as per needs of the benchmark. The benchmarks can be used with almost any 
> branch (just specify the commit sha in the repository section, or 
> alternatively build Solr tgz separately and refer to it in the solr-package 
> parameter).
> 
> 
> On Wed, 12 Aug, 2020, 2:39 am Mike Drob, <md...@apache.org 
> <mailto:md...@apache.org>> wrote:
> Hi Ishan,
> 
> Thanks for starting this conversation! I think it's important to pay 
> attention to performance, but I also have some concerns with coming out with 
> such a strong mandate. In the repository, I'm looking at how to run in local 
> mode, and see that it looks like it will try to download a jdk from some 
> university website? That seems overly restrictive to me, why can't we use the 
> already installed JDK?
> 
> Is the benchmark suite designed for master? Or for branch_8x?
> 
> Mike
> 
> On Tue, Aug 11, 2020 at 9:04 AM Ishan Chattopadhyaya 
> <ichattopadhy...@gmail.com <mailto:ichattopadhy...@gmail.com>> wrote:
> Hi Everyone!
>    From now on, I intend to request/nag/demand/veto code changes, which 
> affect default code paths for most users, be accompanied by performance 
> testing numbers for it (e.g. [1]). Opt in features are fine, I won't 
> personally bother about them (but if you'd like to perf test them, it would 
> set a great precedent anyway).
> 
> I will also work on setting up automated performance and stress testing [2], 
> but in the absence of that, let us do performance test manually and report 
> them in the JIRA. Unless we don't hold ourselves to a high standards, 
> performance will be a joke whereby performance regressions can creep in 
> without the committer(s) taking any responsibility towards those users 
> affected by it (SOLR-14665).
> 
> A benchmarking suite that I am working on is at 
> https://github.com/thesearchstack/solr-bench 
> <https://github.com/thesearchstack/solr-bench> (SOLR-10317). A stress test 
> suite is under development (SOLR-13933). If you wish to use either of these, 
> I shall offer help and support (please ping me on Slack directly or 
> #solr-dev, or open a Github Issue on that repo).
> 
> Regards,
> Ishan
> 
> [1] - 
> https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174221&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174221
>  
> <https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174221&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174221>
> [2] - 
> https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174234&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174234
>  
> <https://issues.apache.org/jira/browse/SOLR-14354?focusedCommentId=17174234&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17174234>
>  (edited) 
> 
>

Re: Performance testing is necessary now

Reply via email to