ved-kashyap-samsung opened a new issue, #32408:
URL: https://github.com/apache/superset/issues/32408

   ## Motivation
   
   The goal of this proposal is to introduce a new feature into Apache Superset 
that leverages Large Language Models (LLMs) to provide advanced dashboard and 
chart summarization capabilities. This feature aims to enhance user experience 
by enabling natural language query support, automated summarization, and 
intelligent chart selection based on user queries. The proposed feature will 
also reduce dependency on less accurate NL-to-SQL conversion models by directly 
utilizing LLMs for query processing.
   
   ## Proposed Change
   
   ### Overview
   
   We propose the integration of an LLM-based agentic architecture into 
Superset to enable the following capabilities:
   1. Natural language query support for dashboards and charts.
   2. Intelligent selection of appropriate charts based on natural language 
queries.
   3. Automated summarization of chart data and dashboards.
   4. Automated text-based reporting based on predefined KPIs and schedules.
   5. Graceful handling of scenarios where relevant charts are not found for a 
given query.
   
   ### Implementation Details
   
   1. **LLM Integration**:
      - Integrate an LLM capable of understanding and processing natural 
language queries.
      - Develop an agent-based system where LLMs can perform actions such as 
selecting relevant charts based on user query, fetching relevant SQL queries 
from existing APIs, applying filters, running final SQL query and getting 
results, and finally summarizing results and generating insights.
   
   2. **Natural Language Query Support**:
      - Add input fields at both dashboard and chart levels to support natural 
language queries.
      - Implement backend services to process these queries using the LLM.
   
   3. **Chart and Dashboard Summarization**:
      - Provide summarization options in the chart menu based on the loaded 
data.
      - Implement automated text-based reporting for dashboards using cron jobs 
for predefined KPIs.
   
   4. **Intelligent Chart Selection**:
      - Develop mechanisms for LLMs to pick the correct chart based on chart 
names or associated metadata.
      - Ensure graceful handling when no relevant charts are found for a query.
   
   5. **Feature Flags**:
      - Enable the feature using feature flags to allow users to opt-in on a 
per-user basis.
   
   ### Mockups and Screenshots
   
   *Mockups and screenshots will be added here once the design phase is 
complete.*
   
   ## New or Changed Public Interfaces
   
   1. **REST Endpoints**:
      - New endpoints for processing natural language queries and returning 
summarized results.
      
   2. **React Components**:
      - New input fields for natural language queries at the dashboard and 
chart levels.
      - Updated chart menu with summarization options.
   
   3. **Configuration**:
      - Configuration options for enabling/disabling the feature using feature 
flags.
      
   4. **CLI Changes**:
      - New CLI commands for managing LLM-related configurations and feature 
flags.
   
   ## New dependencies
   
   1. **LLM Libraries**:
      - We will integrate with existing LLM libraries such as Hugging Face's 
Meta-Llama-3-8B
      - Ensure compatibility with Apache License v2.0.
   
   2. **Other Dependencies**:
      - Additional Python packages for natural language processing and machine 
learning (e.g., NLTK, spaCy).
   
   ## Migration Plan and Compatibility
   
   1. **Database Migrations**:
      - No database migrations are required for this feature.
      
   2. **Compatibility**:
      - Ensure that existing dashboards and charts continue to function without 
any changes.
      - Provide a seamless upgrade path with clear documentation on enabling 
and using the new feature.
   
   3. **Deprecation Strategy**:
      - Allow the new feature to coexist with existing NL-to-SQL conversion 
models during a deprecation period.
      - Provide clear documentation and migration guides for users 
transitioning to the new system.
   
   ## Rejected Alternatives
   
   1. **Enhancing Existing NL-to-SQL Models**:
      - While enhancing existing NL-to-SQL models could improve accuracy, it 
would require significant effort in model training and fine-tuning. The 
LLM-based approach offers a more flexible and scalable solution.
   
   2. **Rule-Based Systems**:
      - Rule-based systems lack the flexibility to handle the wide variety of 
natural language queries effectively. LLMs provide a more robust solution by 
understanding context and intent.
   
   By integrating LLM-based agentic architecture into Superset, we can 
significantly enhance the user experience with advanced natural language 
processing capabilities, making it easier for users to interact with their data 
and generate insights.
   
   ---
   
   This SIP is now open for discussion. Please subscribe and provide your 
feedback here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to