joshua-stauffer opened a new pull request, #58273:
URL: https://github.com/apache/airflow/pull/58273

   <!--
    Licensed to the Apache Software Foundation (ASF) under one
    or more contributor license agreements.  See the NOTICE file
    distributed with this work for additional information
    regarding copyright ownership.  The ASF licenses this file
    to you under the Apache License, Version 2.0 (the
    "License"); you may not use this file except in compliance
    with the License.  You may obtain a copy of the License at
   
      http://www.apache.org/licenses/LICENSE-2.0
   
    Unless required by applicable law or agreed to in writing,
    software distributed under the License is distributed on an
    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    KIND, either express or implied.  See the License for the
    specific language governing permissions and limitations
    under the License.
    -->
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of an existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   
   
   <!-- Please keep an empty line above the dashes. -->
   ---
   
   This pull request introduces a new provider package, 
apache-airflow-providers-greatexpectations, which integrates Apache Airflow 
with the open-source Great Expectations library for data quality validation.
   
   ## Background / Motivation
   
   Great Expectations (GX Core) is an Apache-licensed Python framework for 
validating, documenting, and profiling data.
   
   This provider builds on existing work from Astronomer’s 
[airflow-provider-great-expectations](https://github.com/astronomer/airflow-provider-great-expectations)
   
   ## What this PR includes
   - A new provider package following Airflow’s community provider conventions:
     - Operators for running GX Core validations from Airflow DAGs
     - Example DAGs demonstrating typical data-validation workflows
     - Documentation (README, usage guide, API docstrings)
     - Comprehensive test suite including:
       - Open-source GX Core tests that run entirely offline against public 
sample data
       - Optional GX Cloud integration tests that validate the same operators 
using the GX Cloud API
   - Compatibility metadata and constraints:
     - Requires GX Core ≥ 1.7
     - Compatible with Airflow ≥ 2.10 (including 3.0+)
     - Python ≥ 3.10
   - Conforms to ASF licensing and follows the [Providers 
Lifecycle](https://github.com/apache/airflow/blob/main/providers/MANAGING_PROVIDERS_LIFECYCLE.rst?utm_source=chatgpt.com)
   
   
   ## CI / Testing Details
   
   The base provider tests run without external dependencies.
   However, a small subset of integration tests exercise the optional GX Cloud 
functionality.
   To enable these in the Airflow CI environment, three secrets must be 
configured:
   
   ```
   GX_CLOUD_ACCESS_TOKEN
   GX_CLOUD_ORGANIZATION_ID
   GX_CLOUD_WORKSPACE_ID
   ```
   
   These values are used exclusively for automated tests against a dedicated GX 
Cloud workspace.
   If these secrets are not present, the tests are automatically skipped, and 
the rest of the suite will pass normally.
   This approach ensures CI reproducibility while validating both the 
open-source and managed GX Cloud use cases.
   
   ## Maintenance & Commitment
   
   The Great Expectations and Astronomer teams jointly commit to:
   - Ongoing maintenance of the provider (API updates, compatibility fixes, doc 
improvements)
   - Responding to community issues and PRs related to the provider
   - Keeping tests current with future GX Core and Airflow releases
   
   ## Next Steps / Request for Review
   
   We invite the Airflow community and committers to review this contribution 
and provide feedback on:
   - Provider structure and naming
   - Operator and hook interfaces
   - Dependency and packaging strategy
   - CI integration and test design
   - Documentation and examples
   
   Once community feedback is addressed, we propose merging this under 
providers/great_expectations/ and proceeding with the first release as 
apache-airflow-providers-greatexpectations v1.0.0.
   
   Thank you for your review and support!
   We look forward to collaborating with the Airflow community to make this 
integration broadly useful.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to