Thanks for your detailed response to this. I am definitely not fixed on using carrot for this so am happy to look at a replacement. I wasn't aware of the addition of QuickTheories or CassandraGenerators. A combination of these could easily supply the functionality we need for the SAI testing. The Generators could definitely replace the functionality in SAIRandomizedTest.
I will take a look at these and see if we can work without the carrot generators and will report back in a couple of days on this thread if I can do this easily. As an aside, Caleb and me have already spoken about adding support to Harry for SAI and using this for more large-scale randomized testing of SAI. On Tue, 13 Dec 2022 at 18:24, Josh McKenzie <jmcken...@apache.org> wrote: > Whatever we decide on, let's make sure we document it so newcomers on the > project (or really anyone new to property based testing) can better > discover those things. > > https://cassandra.apache.org/_/development/testing.html > > On Tue, Dec 13, 2022, at 1:08 PM, David Capwell wrote: > > Speaking to Caleb in Slack, so putting the main comments I have there here… > > I am not -1 on this new dependency, but more asking what we should use for > random testing moving forward…. ATM we have the following: > > 1) QuickTheories - I feel like I am the only user at this point… > 2) 1-off - many reinvent random testing for a specific class; using > Random, ThreadLocalRandom, UUID.randomUUID(), and lang3 classes (such > as org.apache.commons.lang3.RandomUtils) > 3) Harry - even though the main API is for cluster testing, this is built > on-top of random generation so could be used for low level random testing > (just less fleshed out for this use-case) > 4) Simulator - same as Harry, built on top of a random generator and not > fleshed out for low level random testing > > Another reason I ask this is I have a fuzz testing that I have developed > for Accord testing that generates random valid CQL statements to make sure > we “do the right thing” and have been struggling with the question “where > do I put this” and “what random do I use?”. I built this off QuickTheories > as I have a lot of utilities for building all supported Tables and Types so > really quick do bootstrap, and every other random testing thing we have are > less fleshed out… so if we add yet another random testing library what > “should” we be using? Do we build on-top of it to get to the same level > QuickTheory is > (see org.apache.cassandra.utils.Generators, > org.apache.cassandra.utils.CassandraGenerators, > and org.apache.cassandra.utils.AbstractTypeGenerators)? > > On Dec 13, 2022, at 9:21 AM, Caleb Rackliffe <calebrackli...@gmail.com> > wrote: > > We need random generators no matter what for these tests, so I think what > we need to decide is whether to continue to use Carrot or migrate those to > QuickTheories, along the lines of what we have now in > org.apache.cassandra.utils.Generators. > > When it comes to a library like this, the thing I would optimize for is > how much it already provides (and therefore how much we need to write and > maintain ourselves). If you look at something like NumericTypeSortingTest > in the 18058 branch <https://github.com/maedhroz/cassandra/pull/6>, it's > pretty compact w/ Carrot's RandomizedTest in use, but I suppose it could > also use IntegersDSL from QT... > > (Not that it matters, but just for reference, we do use > com.carrotsearch.hppc already.) > > On Tue, Dec 13, 2022 at 10:14 AM Mike Adamson <madam...@datastax.com> > wrote: > > Can you talk more about why? There are several ways to do random testing > in-tree ATM, so wondering why we need another one > > > I can see one mechanism for random testing in-tree. That is the Simulator > but that seems primarily involved in the random orchestration of > operations. My apologies if I have simplified its significance. Apart from > that, I can only see different usages of Random in unit tests. I admit I > have not looked beyond this at dtests. > > The random testing in SAI is more focussed on the behaviour of the > low-level index structures and flow of data to / from these. Using randomly > generated values in tests has proved invaluable in highlighting edge > conditions in the code. This above library was only added to provide us > with a rich set of random generators. I am happy to look at removing this > library if its inclusion is contentious. > > > On Mon, 12 Dec 2022 at 19:41, David Capwell <dcapw...@apple.com> wrote: > > com.carrotsearch.randomizedtesting.randomizedtesting-runner 2.1.2 - test > dependency > > > Can you talk more about why? There are several ways to do random testing > in-tree ATM, so wondering why we need another one > > > On Dec 8, 2022, at 6:51 AM, Mike Adamson <madam...@datastax.com> wrote: > > Hi, > > I wanted to discuss the addition of the following dependencies for CEP-7. > The dependencies are: > > org.apache.lucene.lucene-core 7.5.0 > org.apache.lucene.lucene-analyzers-common 7.5.0 > com.carrotsearch.randomizedtesting.randomizedtesting-runner 2.1.2 - test > dependency > > Lucene is an apache project so is licensed APL2. Carrotsearch is not an > apache project but is licensed APL2 > > We are also removing the dependency > on com.github.rholder.snowball-stemmer. This library is used by SASI > stemming filters but a later version of the same library is available in > the lucene libraries. > > Does anyone have any concerns about these changes? > > Mike Adamson > > > > > -- > [image: DataStax Logo Square] <https://www.datastax.com/> > *Mike Adamson* > Engineering > +1 650 389 6000 <16503896000> | datastax.com <https://www.datastax.com/> > Find DataStax Online: > [image: LinkedIn Logo] > <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=akx0E6l2bnTjOvA-YxtonbW0M4b6bNg4nRwmcHNDo4Q&e=> > [image: Facebook Logo] > <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=ncMlB41-6hHuqx-EhnM83-KVtjMegQ9c2l2zDzHAxiU&e=> > [image: Twitter Logo] <https://twitter.com/DataStax> [image: RSS > Feed] <https://www.datastax.com/blog/rss.xml> [image: Github Logo] > <https://github.com/datastax> > > > -- [image: DataStax Logo Square] <https://www.datastax.com/> *Mike Adamson* Engineering +1 650 389 6000 <16503896000> | datastax.com <https://www.datastax.com/> Find DataStax Online: [image: LinkedIn Logo] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=akx0E6l2bnTjOvA-YxtonbW0M4b6bNg4nRwmcHNDo4Q&e=> [image: Facebook Logo] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=ncMlB41-6hHuqx-EhnM83-KVtjMegQ9c2l2zDzHAxiU&e=> [image: Twitter Logo] <https://twitter.com/DataStax> [image: RSS Feed] <https://www.datastax.com/blog/rss.xml> [image: Github Logo] <https://github.com/datastax>