Gokul Kolady created IMPALA-14953:
-------------------------------------

             Summary: Impala AI Query Profile Analyzer
                 Key: IMPALA-14953
                 URL: https://issues.apache.org/jira/browse/IMPALA-14953
             Project: IMPALA
          Issue Type: New Feature
            Reporter: Gokul Kolady


h3. Summary

In Impala, there are query profiles that describe all the intricate details of 
a query's planning, execution, and resource usage. However, nowadays this 
profile is extremely large and hard for the average user to parse through to 
find their information of interest. We want to create an AI-driven query 
profile analyzer that can be accessed from within the Impala Web UI. This 
analyzer will ingest the given query profile as context and provide a summary 
of the profile along with an analysis of performance bottlenecks and their 
sources.
h3. Background & Problem Statement

In Impala, query profiles are the ultimate source of truth for diagnosing 
performance issues. They describe all the intricate details of a query's 
planning, execution, and resource usage (e.g., memory spills, scanner thread 
wait times, join cardinality, and HDFS I/O).

However, as workloads have scaled, these profiles have become incredibly dense, 
highly technical documents—often spanning thousands of lines of text or massive 
JSON structures. For the average data analyst, developer, or even junior 
platform administrator, parsing through this wall of metrics to find the actual 
root cause of a slow or failed query is overwhelming and requires deep, 
specialized domain expertise.
h3. Business Value

By integrating GenAI directly into the diagnostic workflow, we can democratize 
performance tuning. Instead of relying on escalation to Level 3 support or 
expert DBAs, average users will get instant, actionable insights into why their 
query failed or ran slowly, and exactly how to fix it (e.g., "Add table 
statistics," or "Fix data skew on the join key"). This will drastically reduce 
support tickets and accelerate Mean Time To Resolution (MTTR).
h3. Proposed Solution & User Experience

We will build an AI-driven Query Profile Analyzer natively embedded within the 
existing Impala Web UI. When a user views a specific query execution in the Web 
UI, they will see a new "AI Analysis" panel. The system will ingest the query 
profile as context and instantly generate a plain-English summary of the 
execution, highlighting the primary performance bottlenecks.
h3. High-Level Acceptance Criteria
h4. UI/UX Integration

The Impala Web UI (specifically the query details page) includes a clearly 
visible "AI Analysis" tab that contains a "Generate AI Analysis" button.
h4. Context Ingestion & Prompting

The backend must successfully parse the active query profile (stripping 
unnecessary boilerplate to fit within standard LLM token limits if necessary) 
and pass it to the AI model as context.

The system must securely handle the query text and profile data, ensuring that 
PII is handled according to enterprise security standards before being sent to 
the LLM.
h4. Analysis Accuracy

The AI's responses must explicitly reference the specific metrics from the 
user's profile (e.g., "I see your TotalStorageWaitTime was 45 seconds...") and 
map them to documented Impala behaviors.
h4. Configurable AI Backend

Administrators must have the ability to configure which LLM endpoint the 
analyzer points to (e.g., an internal enterprise model or a secure external 
API) via Cloudera Manager or Impala startup flags.

The feature must be an optional function in case an organization's security 
policy prohibits sending diagnostic data to an AI model.
h4. Performance & Error Handling

The system must parse the profile into digestible pieces that help the LLM 
retrieve information about the query without exceeded its context window limit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to