Re: [DISCUSSION] New dependencies for SAI CEP-7

Josh McKenzie Tue, 13 Dec 2022 10:24:26 -0800

Whatever we decide on, let's make sure we document it so newcomers on the 
project (or really anyone new to property based testing) can better discover 
those things.


https://cassandra.apache.org/_/development/testing.html

On Tue, Dec 13, 2022, at 1:08 PM, David Capwell wrote:
> Speaking to Caleb in Slack, so putting the main comments I have there here…
> 
> I am not -1 on this new dependency, but more asking what we should use for 
> random testing moving forward…. ATM we have the following:
> 
> 1) QuickTheories - I feel like I am the only user at this point…
> 2) 1-off - many reinvent random testing for a specific class; using Random, 
> ThreadLocalRandom, UUID.randomUUID(), and lang3 classes (such as 
> org.apache.commons.lang3.RandomUtils)
> 3) Harry - even though the main API is for cluster testing, this is built 
> on-top of random generation so could be used for low level random testing 
> (just less fleshed out for this use-case)
> 4) Simulator - same as Harry, built on top of a random generator and not 
> fleshed out for low level random testing
> 
> Another reason I ask this is I have a fuzz testing that I have developed for 
> Accord testing that generates random valid CQL statements to make sure we “do 
> the right thing” and have been struggling with the question “where do I put 
> this” and “what random do I use?”.  I built this off QuickTheories as I have 
> a lot of utilities for building all supported Tables and Types so really 
> quick do bootstrap, and every other random testing thing we have are less 
> fleshed out… so if we add yet another random testing library what “should” we 
> be using?  Do we build on-top of it to get to the same level QuickTheory is 
> (see org.apache.cassandra.utils.Generators, 
> org.apache.cassandra.utils.CassandraGenerators, and 
> org.apache.cassandra.utils.AbstractTypeGenerators)?
> 
>> On Dec 13, 2022, at 9:21 AM, Caleb Rackliffe <calebrackli...@gmail.com> 
>> wrote:
>> 
>> We need random generators no matter what for these tests, so I think what we 
>> need to decide is whether to continue to use Carrot or migrate those to 
>> QuickTheories, along the lines of what we have now in 
>> org.apache.cassandra.utils.Generators.
>> 
>> When it comes to a library like this, the thing I would optimize for is how 
>> much it already provides (and therefore how much we need to write and 
>> maintain ourselves). If you look at something like NumericTypeSortingTest in 
>> the 18058 branch <https://github.com/maedhroz/cassandra/pull/6>, it's pretty 
>> compact w/ Carrot's RandomizedTest in use, but I suppose it could also use 
>> IntegersDSL from QT...
>> 
>> (Not that it matters, but just for reference, we do use 
>> com.carrotsearch.hppc already.)
>> 
>> On Tue, Dec 13, 2022 at 10:14 AM Mike Adamson <madam...@datastax.com> wrote:
>>>> Can you talk more about why?  There are several ways to do random testing 
>>>> in-tree ATM, so wondering why we need another one
>>> 
>>> I can see one mechanism for random testing in-tree. That is the Simulator 
>>> but that seems primarily involved in the random orchestration of 
>>> operations. My apologies if I have simplified its significance. Apart from 
>>> that, I can only see different usages of Random in unit tests. I admit I 
>>> have not looked beyond this at dtests.
>>> 
>>> The random testing in SAI is more focussed on the behaviour of the 
>>> low-level index structures and flow of data to / from these. Using randomly 
>>> generated values in tests has proved invaluable in highlighting edge 
>>> conditions in the code. This above library was only added to provide us 
>>> with a rich set of random generators. I am happy to look at removing this 
>>> library if its inclusion is contentious.
>>> 
>>> 
>>> On Mon, 12 Dec 2022 at 19:41, David Capwell <dcapw...@apple.com> wrote:
>>>>> com.carrotsearch.randomizedtesting.randomizedtesting-runner 2.1.2 - test 
>>>>> dependency
>>>> 
>>>> Can you talk more about why?  There are several ways to do random testing 
>>>> in-tree ATM, so wondering why we need another one
>>>> 
>>>> 
>>>>> On Dec 8, 2022, at 6:51 AM, Mike Adamson <madam...@datastax.com> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I wanted to discuss the addition of the following dependencies for CEP-7. 
>>>>> The dependencies are:
>>>>> 
>>>>> org.apache.lucene.lucene-core 7.5.0
>>>>> org.apache.lucene.lucene-analyzers-common 7.5.0
>>>>> com.carrotsearch.randomizedtesting.randomizedtesting-runner 2.1.2 - test 
>>>>> dependency
>>>>> 
>>>>> Lucene is an apache project so is licensed APL2. Carrotsearch is not an 
>>>>> apache project but is licensed APL2
>>>>> 
>>>>> We are also removing the dependency on 
>>>>> com.github.rholder.snowball-stemmer. This library is used by SASI 
>>>>> stemming filters but a later version of the same library is available in 
>>>>> the lucene libraries.
>>>>> 
>>>>> Does anyone have any concerns about these changes?
>>>>> 
>>>>> Mike Adamson
>>>>> 
>>>> 
>>> 
>>> 
>>> -- 
>>> DataStax Logo Square <https://www.datastax.com/>
>>> *Mike Adamson*
>>> Engineering
>>> +1 650 389 6000 <tel:16503896000> | datastax.com <https://www.datastax.com/>
>>> Find DataStax Online:
>>> LinkedIn Logo 
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=akx0E6l2bnTjOvA-YxtonbW0M4b6bNg4nRwmcHNDo4Q&e=>
>>>    Facebook Logo 
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=ncMlB41-6hHuqx-EhnM83-KVtjMegQ9c2l2zDzHAxiU&e=>
>>>    Twitter Logo <https://twitter.com/DataStax>   RSS Feed 
>>> <https://www.datastax.com/blog/rss.xml>   Github Logo 
>>> <https://github.com/datastax>

Re: [DISCUSSION] New dependencies for SAI CEP-7

Reply via email to