Thanks Andy, very informative! For the fuzz testing, does Comet run in CI or just manually?
On 2025/12/18 14:15:04 Andy Grove wrote: > I'd like to share some quick notes about our experiences with correctness > testing in Apache DataFusion Comet. > > We run the full Spark SQL test suite and it has caught many bugs. One issue > is that many of the tests explicitly check that plans contain specific > operators such as ProjectExec, so we have to modify those tests to also > accept CometProjectExec. We have to do this for multiple Spark versions > too. The approach we took is to maintain diff files in the Comet repo that > we apply to Spark to modify the tests. You can read about our approach in > the contributor guide documentation [1]. > > We also developed CometFuzz [2], a fuzz testing tool that generates random > Parquet files and random queries and then runs those queries with Comet > disabled, then enabled, and compares the results. This tool actually has no > dependencies on Comet and you could use it with Auron as well. > > I hope this is helpful. > > Thanks, > > Andy. > > [1] > https://datafusion.apache.org/comet/contributor-guide/spark-sql-tests.html > [2] https://github.com/apache/datafusion-comet/tree/main/fuzz-testing > > On Thu, Dec 18, 2025 at 1:15 AM James Xu <[email protected]> wrote: > > > Hi Mang Zhang, > > > > For question 1: Yes, there will be large amount of Spark test file, but > > most of the code is simply some chore work to inherit the vanilla Spark > > test with native engine enabled, we are NOT copying the Spark tests into > > Auron code. We will depend on the Spark's test JAR as you mentioned, but we > > need to do some inheritance, enable the native engine, and sometimes > > disable some tests(due to Auron's bug). > > > > > > For question 2: First the test code of a specific released version of > > Spark will not change. And if there is change in Spark, e.g. some bugs are > > fixed, Spark change, we also need to change, this is the purpose of > > correctness testing. > > > > On 2025/12/18 04:18:02 Mang Zhang wrote: > > > Hi James, > > > Thanks for driving this. +1! > > > I see the proposal mentions migrating a large amount of Spark test code. > > There are two issues here: > > > 1. There will be a significant amount of migration work. > > > 2. Code maintenance work: If Spark code changes, Auron may also require > > corresponding modifications. > > > > > > > > > Can we achieve our testing objectives by introducing Spark's test JAR? > > > This approach would only require updating Spark's dependency version, > > saving us a significant amount of work. > > > Similarly, Flink could adopt the same pattern in the future. > > > > > > > > > > > > -- > > > > > > Best regards, > > > Mang Zhang > > > > > > > > > > > > At 2025-12-18 11:38:17, "Shreyesh Arangath" <[email protected]> > > wrote: > > > >Great effort! Thanks for driving this. +1! > > > > > > > >PS: Could you also provide comment access for the document so that we > > can > > > >ask questions? Thanks > > > > > > > >Best, > > > >Shreyesh > > > > > > > >On Tue, Dec 16, 2025 at 8:05 AM James <[email protected]> wrote: > > > > > > > >> Hi, everyone I'd like to start a discussion about AIP-2: Enhance > > Auron’s > > > >> Correctness Testing [1]. Looking forward to your feedback. > > > >> > > > >> [1]. > > > >> > > > >> > > https://docs.google.com/document/d/1v8wMyLZXuA7tmDSJysAo8CRqWX36SWR5ZgjyOMSfm94/edit?tab=t.0 > > > >> > > > > > >
