[ 
https://issues.apache.org/jira/browse/IMPALA-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gokul Kolady updated IMPALA-14953:
----------------------------------
    Issue Type: Epic  (was: New Feature)

> Impala AI Query Profile Analyzer
> --------------------------------
>
>                 Key: IMPALA-14953
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14953
>             Project: IMPALA
>          Issue Type: Epic
>            Reporter: Gokul Kolady
>            Priority: Major
>
> h3. Summary
> In Impala, there are query profiles that describe all the intricate details 
> of a query's planning, execution, and resource usage. However, nowadays this 
> profile is extremely large and hard for the average user to parse through to 
> find their information of interest. We want to create an AI-driven query 
> profile analyzer that can be accessed from within the Impala Web UI. This 
> analyzer will ingest the given query profile as context and provide a summary 
> of the profile along with an analysis of performance bottlenecks and their 
> sources.
> h3. Background & Problem Statement
> In Impala, query profiles are the ultimate source of truth for diagnosing 
> performance issues. They describe all the intricate details of a query's 
> planning, execution, and resource usage (e.g., memory spills, scanner thread 
> wait times, join cardinality, and HDFS I/O).
> However, as workloads have scaled, these profiles have become incredibly 
> dense, highly technical documents—often spanning thousands of lines of text 
> or massive JSON structures. For the average data analyst, developer, or even 
> junior platform administrator, parsing through this wall of metrics to find 
> the actual root cause of a slow or failed query is overwhelming and requires 
> deep, specialized domain expertise.
> h3. Business Value
> By integrating GenAI directly into the diagnostic workflow, we can 
> democratize performance tuning. Instead of relying on escalation to Level 3 
> support or expert DBAs, average users will get instant, actionable insights 
> into why their query failed or ran slowly, and exactly how to fix it (e.g., 
> "Add table statistics," or "Fix data skew on the join key"). This will 
> drastically reduce support tickets and accelerate Mean Time To Resolution 
> (MTTR).
> h3. Proposed Solution & User Experience
> We will build an AI-driven Query Profile Analyzer natively embedded within 
> the existing Impala Web UI. When a user views a specific query execution in 
> the Web UI, they will see a new "AI Analysis" panel. The system will ingest 
> the query profile as context and instantly generate a plain-English summary 
> of the execution, highlighting the primary performance bottlenecks.
> h3. High-Level Acceptance Criteria
> h4. UI/UX Integration
> The Impala Web UI (specifically the query details page) includes a clearly 
> visible "AI Analysis" tab that contains a "Generate AI Analysis" button.
> h4. Context Ingestion & Prompting
> The backend must successfully parse the active query profile (stripping 
> unnecessary boilerplate to fit within standard LLM token limits if necessary) 
> and pass it to the AI model as context.
> The system must securely handle the query text and profile data, ensuring 
> that PII is handled according to enterprise security standards before being 
> sent to the LLM.
> h4. Analysis Accuracy
> The AI's responses must explicitly reference the specific metrics from the 
> user's profile (e.g., "I see your TotalStorageWaitTime was 45 seconds...") 
> and map them to documented Impala behaviors.
> h4. Configurable AI Backend
> Administrators must have the ability to configure which LLM endpoint the 
> analyzer points to (e.g., an internal enterprise model or a secure external 
> API) via Cloudera Manager or Impala startup flags.
> The feature must be an optional function in case an organization's security 
> policy prohibits sending diagnostic data to an AI model.
> h4. Performance & Error Handling
> The system must parse the profile into digestible pieces that help the LLM 
> retrieve information about the query without exceeded its context window 
> limit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to