[
https://issues.apache.org/jira/browse/IMPALA-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gokul Kolady updated IMPALA-14953:
----------------------------------
Issue Type: Epic (was: New Feature)
> Impala AI Query Profile Analyzer
> --------------------------------
>
> Key: IMPALA-14953
> URL: https://issues.apache.org/jira/browse/IMPALA-14953
> Project: IMPALA
> Issue Type: Epic
> Reporter: Gokul Kolady
> Priority: Major
>
> h3. Summary
> In Impala, there are query profiles that describe all the intricate details
> of a query's planning, execution, and resource usage. However, nowadays this
> profile is extremely large and hard for the average user to parse through to
> find their information of interest. We want to create an AI-driven query
> profile analyzer that can be accessed from within the Impala Web UI. This
> analyzer will ingest the given query profile as context and provide a summary
> of the profile along with an analysis of performance bottlenecks and their
> sources.
> h3. Background & Problem Statement
> In Impala, query profiles are the ultimate source of truth for diagnosing
> performance issues. They describe all the intricate details of a query's
> planning, execution, and resource usage (e.g., memory spills, scanner thread
> wait times, join cardinality, and HDFS I/O).
> However, as workloads have scaled, these profiles have become incredibly
> dense, highly technical documents—often spanning thousands of lines of text
> or massive JSON structures. For the average data analyst, developer, or even
> junior platform administrator, parsing through this wall of metrics to find
> the actual root cause of a slow or failed query is overwhelming and requires
> deep, specialized domain expertise.
> h3. Business Value
> By integrating GenAI directly into the diagnostic workflow, we can
> democratize performance tuning. Instead of relying on escalation to Level 3
> support or expert DBAs, average users will get instant, actionable insights
> into why their query failed or ran slowly, and exactly how to fix it (e.g.,
> "Add table statistics," or "Fix data skew on the join key"). This will
> drastically reduce support tickets and accelerate Mean Time To Resolution
> (MTTR).
> h3. Proposed Solution & User Experience
> We will build an AI-driven Query Profile Analyzer natively embedded within
> the existing Impala Web UI. When a user views a specific query execution in
> the Web UI, they will see a new "AI Analysis" panel. The system will ingest
> the query profile as context and instantly generate a plain-English summary
> of the execution, highlighting the primary performance bottlenecks.
> h3. High-Level Acceptance Criteria
> h4. UI/UX Integration
> The Impala Web UI (specifically the query details page) includes a clearly
> visible "AI Analysis" tab that contains a "Generate AI Analysis" button.
> h4. Context Ingestion & Prompting
> The backend must successfully parse the active query profile (stripping
> unnecessary boilerplate to fit within standard LLM token limits if necessary)
> and pass it to the AI model as context.
> The system must securely handle the query text and profile data, ensuring
> that PII is handled according to enterprise security standards before being
> sent to the LLM.
> h4. Analysis Accuracy
> The AI's responses must explicitly reference the specific metrics from the
> user's profile (e.g., "I see your TotalStorageWaitTime was 45 seconds...")
> and map them to documented Impala behaviors.
> h4. Configurable AI Backend
> Administrators must have the ability to configure which LLM endpoint the
> analyzer points to (e.g., an internal enterprise model or a secure external
> API) via Cloudera Manager or Impala startup flags.
> The feature must be an optional function in case an organization's security
> policy prohibits sending diagnostic data to an AI model.
> h4. Performance & Error Handling
> The system must parse the profile into digestible pieces that help the LLM
> retrieve information about the query without exceeded its context window
> limit.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]