Re: [DISCUSSION] New dependencies for SAI CEP-7

Mike Adamson Wed, 14 Dec 2022 01:46:53 -0800

Thanks for your detailed response to this. I am definitely not fixed on
using carrot for this so am happy to look at a replacement. I wasn't aware
of the addition of QuickTheories or CassandraGenerators. A combination of
these could easily supply the functionality we need for the SAI testing.
The Generators could definitely replace the functionality in
SAIRandomizedTest.


I will take a look at these and see if we can work without the carrot
generators and will report back in a couple of days on this thread if I can
do this easily.

As an aside, Caleb and me have already spoken about adding support to Harry
for SAI and using this for more large-scale randomized testing of SAI.

On Tue, 13 Dec 2022 at 18:24, Josh McKenzie <[email protected]> wrote:

> Whatever we decide on, let's make sure we document it so newcomers on the
> project (or really anyone new to property based testing) can better
> discover those things.
>
> https://cassandra.apache.org/_/development/testing.html
>
> On Tue, Dec 13, 2022, at 1:08 PM, David Capwell wrote:
>
> Speaking to Caleb in Slack, so putting the main comments I have there here…
>
> I am not -1 on this new dependency, but more asking what we should use for
> random testing moving forward…. ATM we have the following:
>
> 1) QuickTheories - I feel like I am the only user at this point…
> 2) 1-off - many reinvent random testing for a specific class; using
> Random, ThreadLocalRandom, UUID.randomUUID(), and lang3 classes (such
> as org.apache.commons.lang3.RandomUtils)
> 3) Harry - even though the main API is for cluster testing, this is built
> on-top of random generation so could be used for low level random testing
> (just less fleshed out for this use-case)
> 4) Simulator - same as Harry, built on top of a random generator and not
> fleshed out for low level random testing
>
> Another reason I ask this is I have a fuzz testing that I have developed
> for Accord testing that generates random valid CQL statements to make sure
> we “do the right thing” and have been struggling with the question “where
> do I put this” and “what random do I use?”.  I built this off QuickTheories
> as I have a lot of utilities for building all supported Tables and Types so
> really quick do bootstrap, and every other random testing thing we have are
> less fleshed out… so if we add yet another random testing library what
> “should” we be using?  Do we build on-top of it to get to the same level
> QuickTheory is
> (see org.apache.cassandra.utils.Generators, 
> org.apache.cassandra.utils.CassandraGenerators,
> and org.apache.cassandra.utils.AbstractTypeGenerators)?
>
> On Dec 13, 2022, at 9:21 AM, Caleb Rackliffe <[email protected]>
> wrote:
>
> We need random generators no matter what for these tests, so I think what
> we need to decide is whether to continue to use Carrot or migrate those to
> QuickTheories, along the lines of what we have now in
> org.apache.cassandra.utils.Generators.
>
> When it comes to a library like this, the thing I would optimize for is
> how much it already provides (and therefore how much we need to write and
> maintain ourselves). If you look at something like NumericTypeSortingTest
> in the 18058 branch <https://github.com/maedhroz/cassandra/pull/6>, it's
> pretty compact w/ Carrot's RandomizedTest in use, but I suppose it could
> also use IntegersDSL from QT...
>
> (Not that it matters, but just for reference, we do use
> com.carrotsearch.hppc already.)
>
> On Tue, Dec 13, 2022 at 10:14 AM Mike Adamson <[email protected]>
> wrote:
>
> Can you talk more about why?  There are several ways to do random testing
> in-tree ATM, so wondering why we need another one
>
>
> I can see one mechanism for random testing in-tree. That is the Simulator
> but that seems primarily involved in the random orchestration of
> operations. My apologies if I have simplified its significance. Apart from
> that, I can only see different usages of Random in unit tests. I admit I
> have not looked beyond this at dtests.
>
> The random testing in SAI is more focussed on the behaviour of the
> low-level index structures and flow of data to / from these. Using randomly
> generated values in tests has proved invaluable in highlighting edge
> conditions in the code. This above library was only added to provide us
> with a rich set of random generators. I am happy to look at removing this
> library if its inclusion is contentious.
>
>
> On Mon, 12 Dec 2022 at 19:41, David Capwell <[email protected]> wrote:
>
> com.carrotsearch.randomizedtesting.randomizedtesting-runner 2.1.2 - test
> dependency
>
>
> Can you talk more about why?  There are several ways to do random testing
> in-tree ATM, so wondering why we need another one
>
>
> On Dec 8, 2022, at 6:51 AM, Mike Adamson <[email protected]> wrote:
>
> Hi,
>
> I wanted to discuss the addition of the following dependencies for CEP-7.
> The dependencies are:
>
> org.apache.lucene.lucene-core 7.5.0
> org.apache.lucene.lucene-analyzers-common 7.5.0
> com.carrotsearch.randomizedtesting.randomizedtesting-runner 2.1.2 - test
> dependency
>
> Lucene is an apache project so is licensed APL2. Carrotsearch is not an
> apache project but is licensed APL2
>
> We are also removing the dependency
> on com.github.rholder.snowball-stemmer. This library is used by SASI
> stemming filters but a later version of the same library is available in
> the lucene libraries.
>
> Does anyone have any concerns about these changes?
>
> Mike Adamson
>
>
>
>
> --
> [image: DataStax Logo Square] <https://www.datastax.com/>
> *Mike Adamson*
> Engineering
> +1 650 389 6000 <16503896000> | datastax.com <https://www.datastax.com/>
> Find DataStax Online:
> [image: LinkedIn Logo]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=akx0E6l2bnTjOvA-YxtonbW0M4b6bNg4nRwmcHNDo4Q&e=>
>    [image: Facebook Logo]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=ncMlB41-6hHuqx-EhnM83-KVtjMegQ9c2l2zDzHAxiU&e=>
>    [image: Twitter Logo] <https://twitter.com/DataStax>   [image: RSS
> Feed] <https://www.datastax.com/blog/rss.xml>   [image: Github Logo]
> <https://github.com/datastax>
>
>
>

-- 
[image: DataStax Logo Square] <https://www.datastax.com/> *Mike Adamson*
Engineering

+1 650 389 6000 <16503896000> | datastax.com <https://www.datastax.com/>
Find DataStax Online: [image: LinkedIn Logo]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=akx0E6l2bnTjOvA-YxtonbW0M4b6bNg4nRwmcHNDo4Q&e=>
   [image: Facebook Logo]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=ncMlB41-6hHuqx-EhnM83-KVtjMegQ9c2l2zDzHAxiU&e=>
   [image: Twitter Logo] <https://twitter.com/DataStax>   [image: RSS Feed]
<https://www.datastax.com/blog/rss.xml>   [image: Github Logo]
<https://github.com/datastax>

Re: [DISCUSSION] New dependencies for SAI CEP-7

Reply via email to