cetingokhan commented on PR #62963:
URL: https://github.com/apache/airflow/pull/62963#issuecomment-4051051339

   Hi @kaxil and @gopidesupavan,
   
   I’ve reached a solid milestone with the LLMDataQualityOperator and wanted to 
share the current progress. Since LLM-based data quality is a broad topic, I’ve 
implemented several core features, but I’d love your feedback on whether we 
should keep this scope or simplify certain parts.
   
   **Current Implementation Highlights:**
   
   **Foundation:** The operator is built on top of the LLMOperator base.
   
   **Performance Optimization:** Instead of running individual queries for 
every rule, I’ve implemented batch execution. The logic intelligently groups 
rules by table, DQ type, and column type to minimize overhead.
   
   **Approval-Ready Output:** When dry_run=True, the operator writes a 
Markdown-formatted report to XCom. This makes the manual approval process much 
cleaner and more human-readable in the UI.
   
   **Issue Detection (Optional):** Users can opt to generate specific SQL 
queries to identify records that fail validation rules.
   
   **Result Collection:** If collect_unexpected is enabled, the operator 
executes the generated SQL for failed rules and lists the problematic records 
in XCom as JSON (governed by a configurable limit).
   
   **Caching Mechanism:** To avoid redundant LLM calls, I’ve implemented a way 
to store and reuse results in Airflow Variables when the quality rules remain 
unchanged.
   
   **My Question:**
   I feel this covers the essential "Data Quality with LLM" workflow, but I’m 
wary of over-engineering. Does this feature set feel right for the initial 
version, or would you prefer a more "stripped-down" approach to start with?
   
   Looking forward to your guidance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to