2010YOUY01 commented on issue #14535:
URL: https://github.com/apache/datafusion/issues/14535#issuecomment-2712646798

   > Hello, I am interested in applying to work on this project for GSoC. After 
reading through [#11030](https://github.com/apache/datafusion/issues/11030) , 
it looks like the three testing oracles that have been implemented from 
SQLancer are NoREC, TLP, and PQS. Were those chosen because they were the 
easiest to implement, or was there something about how they test Datafusion 
specifically?
   
   👋🏼 They're implemented first because
   - All of them are general-purpose algorithms, which should work in most 
systems: 
     - Taking TLP for example, it's targeting edge case value handling (NULLs) 
so it should work well for DataFusion.
     - NoREC: It's checking the consistency between optimized (by predicate 
pushdown) path and non-optimized path, and also it's checking the consistency 
between how a same predicate is evaluated in `select expr` and `where expr`, it 
also caught several bugs for DataFusion
     - PQS: I don't have very good intuition on why this one should work and I 
think it has caught 0 or 1 bug 🤦🏼 Perhaps I'm missing something.
   - Yes, they're also very easy to implement.
   
   To make fuzzing more specific to DataFusion, I think the most needed is 
configuration fuzzing or data source fuzzing. To make `DataFusion` more 
performant. For the same executor there are many specialized execution paths, 
and they're controlled by turning different configuration knobs in 
https://datafusion.apache.org/user-guide/configs.html
   However, they're quite hard to implement due to the complexity. Given a 
randomly generated query, if we pick a random configuration for every option, 
it's very likely to fail, becuase this configuration is invalid, I believe 
given a specific query only a small subset of the configurations are relevant. 
So now we implement this kind of configuration fuzzing separately, for example 
https://github.com/apache/datafusion/blob/main/datafusion/core/tests/fuzz_cases/aggregation_fuzzer/mod.rs
   I wish configuration fuzzing can be integrated into `datafusion-sqlancer`, 
but it still has a long way to go.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to