We run CometFuzz manually. However, many of our more recent Scala tests use the same underlying classes to generate random data for a specified schema, and we cover all the usual edge cases there (NaN, Infinity, -0.0, etc.).
On Thu, Dec 18, 2025 at 9:04 AM James Xu <[email protected]> wrote: > Thanks Andy, very informative! > > For the fuzz testing, does Comet run in CI or just manually? > > On 2025/12/18 14:15:04 Andy Grove wrote: > > I'd like to share some quick notes about our experiences with correctness > > testing in Apache DataFusion Comet. > > > > We run the full Spark SQL test suite and it has caught many bugs. One > issue > > is that many of the tests explicitly check that plans contain specific > > operators such as ProjectExec, so we have to modify those tests to also > > accept CometProjectExec. We have to do this for multiple Spark versions > > too. The approach we took is to maintain diff files in the Comet repo > that > > we apply to Spark to modify the tests. You can read about our approach in > > the contributor guide documentation [1]. > > > > We also developed CometFuzz [2], a fuzz testing tool that generates > random > > Parquet files and random queries and then runs those queries with Comet > > disabled, then enabled, and compares the results. This tool actually has > no > > dependencies on Comet and you could use it with Auron as well. > > > > I hope this is helpful. > > > > Thanks, > > > > Andy. > > > > [1] > > > https://datafusion.apache.org/comet/contributor-guide/spark-sql-tests.html > > [2] https://github.com/apache/datafusion-comet/tree/main/fuzz-testing > > > > On Thu, Dec 18, 2025 at 1:15 AM James Xu <[email protected]> wrote: > > > > > Hi Mang Zhang, > > > > > > For question 1: Yes, there will be large amount of Spark test file, but > > > most of the code is simply some chore work to inherit the vanilla Spark > > > test with native engine enabled, we are NOT copying the Spark tests > into > > > Auron code. We will depend on the Spark's test JAR as you mentioned, > but we > > > need to do some inheritance, enable the native engine, and sometimes > > > disable some tests(due to Auron's bug). > > > > > > > > > For question 2: First the test code of a specific released version of > > > Spark will not change. And if there is change in Spark, e.g. some bugs > are > > > fixed, Spark change, we also need to change, this is the purpose of > > > correctness testing. > > > > > > On 2025/12/18 04:18:02 Mang Zhang wrote: > > > > Hi James, > > > > Thanks for driving this. +1! > > > > I see the proposal mentions migrating a large amount of Spark test > code. > > > There are two issues here: > > > > 1. There will be a significant amount of migration work. > > > > 2. Code maintenance work: If Spark code changes, Auron may also > require > > > corresponding modifications. > > > > > > > > > > > > Can we achieve our testing objectives by introducing Spark's test > JAR? > > > > This approach would only require updating Spark's dependency version, > > > saving us a significant amount of work. > > > > Similarly, Flink could adopt the same pattern in the future. > > > > > > > > > > > > > > > > -- > > > > > > > > Best regards, > > > > Mang Zhang > > > > > > > > > > > > > > > > At 2025-12-18 11:38:17, "Shreyesh Arangath" < > [email protected]> > > > wrote: > > > > >Great effort! Thanks for driving this. +1! > > > > > > > > > >PS: Could you also provide comment access for the document so that > we > > > can > > > > >ask questions? Thanks > > > > > > > > > >Best, > > > > >Shreyesh > > > > > > > > > >On Tue, Dec 16, 2025 at 8:05 AM James <[email protected]> > wrote: > > > > > > > > > >> Hi, everyone I'd like to start a discussion about AIP-2: Enhance > > > Auron’s > > > > >> Correctness Testing [1]. Looking forward to your feedback. > > > > >> > > > > >> [1]. > > > > >> > > > > >> > > > > https://docs.google.com/document/d/1v8wMyLZXuA7tmDSJysAo8CRqWX36SWR5ZgjyOMSfm94/edit?tab=t.0 > > > > >> > > > > > > > > > >
