jsai28 opened a new issue, #15483: URL: https://github.com/apache/datafusion/issues/15483
Would there be any interest in building a data quality framework like [Great Expectations](https://github.com/great-expectations/great_expectationshttps://github.com/great-expectations/great_expectations) or [Deequ](https://github.com/awslabs/deequ) (built on spark) except in Rust using DataFusion? As far as I am aware, there is nothing like this in Rust let alone built on DataFusion. The idea is essentially a Rust-based tool to specify unit-like tests for your data. Users would specify tests (called expectations in Great Expectations) and then DataFusion could be used for the underlying metric computation. Essentially something like this: ``` fn main() { let validator = create_validator(‘example.csv’); validator.is_not_null(“id”); // specify column for null check validator.min_value(“price”, 0); // specify column and minimum value validator.validate(); } ``` Which could return an output like this: ``` ✅ id: Passed (All values not null) ❌ price: Failed (2 values below 0) ``` It would be a pretty niche tool that could be apart of a larger data pipeline. I was thinking it could be a good project to work on for GSoC. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
