[GitHub] [arrow-ballista] explicite opened a new issue, #802: Data quality framework

via GitHub Mon, 05 Jun 2023 02:19:55 -0700


explicite opened a new issue, #802:
URL: https://github.com/apache/arrow-ballista/issues/802


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   (This section helps Arrow developers understand the context and *why* for 
this feature, in addition to  the *what*)
   
   **Describe the solution you'd like**
   When building DAG of transformations, I want to be able define `tests` which 
can prove data correctness. On the end to the DAG  I should be able to o review 
data quality and provide context to end user if required. 
   
   Like in [Deequ](https://github.com/awslabs/deequ) I can check if all id's 
are unique or in some column I can find data in correct format. Other 
approaches [Apache 
Glue](https://docs.aws.amazon.com/glue/latest/ug/gs-data-quality-chapter.html), 
[dbt test](https://docs.getdbt.com/docs/build/tests) or [Great 
Expectation](https://github.com/great-expectations/great_expectations)
   
   **Describe alternatives you've considered**
   Instead of building framework it's maybe possible to extend [Great 
Expectation](https://github.com/great-expectations/great_expectations)
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-ballista] explicite opened a new issue, #802: Data quality framework

Reply via email to