korbit-ai[bot] commented on code in PR #32649: URL: https://github.com/apache/superset/pull/32649#discussion_r1993418860
########## superset/dashboards/dashboard_agentic_query/dashboard_agentic_query.py: ########## @@ -0,0 +1,99 @@ +from langchain.agents import Tool, ZeroShotAgent, AgentExecutor +from langchain_openai import ChatOpenAI +from langchain import LLMChain +from langchain.prompts import PromptTemplate +import time + +import urllib3 +from superset.dashboards.dashboard_agentic_query.get_all_charts_name import get_charts_list +from superset.dashboards.dashboard_agentic_query.get_chart_data import get_chart_data +from superset.dashboards.dashboard_agentic_query.unix_timestamp_to_human_readable import convert_unix_to_human_readable +from flask import current_app + +urllib3.disable_warnings(category=urllib3.exceptions.InsecureRequestWarning) + +DASHBOARD_AGENTIC_QUERY_CONFIG = current_app.config["DASHBOARD_AGENTIC_QUERY_CONFIG"] + +mrkl_template_value = ''' +Answer the following questions as best you can. You have access to the following tools: +-> You must never do any hallucinations. +-> You must never call a tool more than once for same input +-> If date is given in Unix timestamps then convert it into human readable format + +get_charts_list: This tool will give list of all chart_names in a dashboard along with their chart_id, it takes a string dashboard_id as input. +get_chart_data: This tool will return the data used to create a particular chart, it takes a string chart_id as input. +convert_unix_to_human_readable: This tool will convert unix timestamp into human readable format, it takes a string unix_timestamp and returns a string i.e human readable format of that timestamp. + +Use the following format: + +Question: the input question you must answer +Thought: you should always think about what to do +Action: the action to take, should be one of [get_charts_list, get_chart_data] Review Comment: ### Incomplete Tool List in Prompt Template <sub></sub> <details> <summary>Tell me more</summary> ###### What is the issue? The prompt template incorrectly states that actions should be one of [get_charts_list, get_chart_data], excluding the convert_unix_to_human_readable tool. ###### Why this matters This mismatch between available tools and prompt instructions could prevent the agent from using the timestamp conversion functionality, leading to incorrect or incomplete responses when dealing with Unix timestamps. ###### Suggested change ∙ *Feature Preview* Update the prompt template to include all available tools: ```python Action: the action to take, should be one of [get_charts_list, get_chart_data, convert_unix_to_human_readable] ``` </details> <sub> [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/5d87a6ca-92fb-4ac7-8c14-a33d6d40d6f7?suggestedFixEnabled=true) 💬 Looking for more details? Reply to this comment to chat with Korbit. </sub> <!--- korbi internal id:475c13d9-a1df-4615-a20f-7584d953f756 --> ########## superset/dashboards/dashboard_agentic_query/utils.py: ########## @@ -0,0 +1,54 @@ +import json +import ast + +def correct_lst(input_str): + first_open = -1 + last_close = -1 + for i in range(len(input_str)): + if(input_str[i]=='[' and first_open==-1): + first_open = i + elif(input_str[i]==']'): + last_close = i + + return input_str[first_open : last_close + 1] Review Comment: ### Unsafe List Extraction Logic <sub></sub> <details> <summary>Tell me more</summary> ###### What is the issue? The function fails to handle cases where no brackets are found (first_open or last_close remain -1) or where there are nested lists. ###### Why this matters This can lead to IndexError exceptions or incorrect list extraction when processing nested data structures, which is critical for dashboard data processing. ###### Suggested change ∙ *Feature Preview* ```python def correct_lst(input_str): if not input_str: raise ValueError("Empty input string") first_open = input_str.find('[') if first_open == -1: raise ValueError("No opening bracket found") bracket_count = 0 for i in range(first_open, len(input_str)): if input_str[i] == '[': bracket_count += 1 elif input_str[i] == ']': bracket_count -= 1 if bracket_count == 0: return input_str[first_open:i + 1] raise ValueError("Unmatched brackets") ``` </details> <sub> [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/ceadf390-bd0f-4308-9208-524dd6454fce?suggestedFixEnabled=true) 💬 Looking for more details? Reply to this comment to chat with Korbit. </sub> <!--- korbi internal id:72c613b2-9e13-4caf-b7cc-3469756eb92a --> ########## superset/dashboards/dashboard_agentic_query/utils.py: ########## @@ -0,0 +1,54 @@ +import json +import ast + +def correct_lst(input_str): + first_open = -1 + last_close = -1 + for i in range(len(input_str)): + if(input_str[i]=='[' and first_open==-1): + first_open = i + elif(input_str[i]==']'): + last_close = i + + return input_str[first_open : last_close + 1] + +def refactor_input(input_str): + input_str = str(input_str) + if('\n' in input_str): + input_str = input_str.split('\n')[0] + input_str = input_str.strip() + if(input_str.startswith("'") or input_str.startswith('"')): + input_str = input_str[1:] + if(input_str.endswith("'") or input_str.endswith('"')): + input_str = input_str[:-1] + return input_str + +def extract_int_if_possible(input_str): + if(':' in input_str): + input_str = input_str.split(':')[1] + elif('=' in input_str): + input_str = input_str.split('=')[1] + + return input_str.strip() + +def extract_json_from_string(input_str): + start_open_curly = -1 + end_close_curly = -1 + + for i in range(len(input_str)): + if(input_str[i] == '{' and start_open_curly == -1): + start_open_curly = i + elif(input_str[i] == '}'): + end_close_curly = i Review Comment: ### Invalid JSON Extraction <sub></sub> <details> <summary>Tell me more</summary> ###### What is the issue? The function doesn't handle nested JSON objects correctly and can return invalid JSON if the input contains multiple closing braces. ###### Why this matters When processing complex dashboard data that contains nested JSON structures, this can lead to corrupted data being returned to the agentic query feature. ###### Suggested change ∙ *Feature Preview* ```python def extract_json_from_string(input_str): try: start = input_str.find('{') if start == -1: raise ValueError("No JSON object found") brace_count = 0 for i in range(start, len(input_str)): if input_str[i] == '{': brace_count += 1 elif input_str[i] == '}': brace_count -= 1 if brace_count == 0: # Validate the extracted JSON result = input_str[start:i + 1] json.loads(result) # Will raise JSONDecodeError if invalid return result raise ValueError("Unmatched braces") except json.JSONDecodeError: raise ValueError("Invalid JSON structure") ``` </details> <sub> [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/edde0fe4-3aca-4015-9321-c622ac44fd0a?suggestedFixEnabled=true) 💬 Looking for more details? Reply to this comment to chat with Korbit. </sub> <!--- korbi internal id:0c1f22e1-9938-4a11-b0bf-fc60b6f99e51 --> ########## superset/dashboards/dashboard_agentic_query/unix_timestamp_to_human_readable.py: ########## @@ -0,0 +1,17 @@ +import datetime +from superset.dashboards.dashboard_agentic_query.utils import refactor_input, extract_int_if_possible + +def convert_unix_to_human_readable(unix_timestamp): + unix_timestamp = refactor_input(unix_timestamp) + unix_timestamp = extract_int_if_possible(unix_timestamp) + if(len(unix_timestamp) > 10): + unix_timestamp = unix_timestamp[:10] Review Comment: ### Invalid timestamp truncation <sub></sub> <details> <summary>Tell me more</summary> ###### What is the issue? Arbitrary truncation of timestamps longer than 10 digits could lead to incorrect date conversions ###### Why this matters This can result in wrong date calculations for timestamps after September 2001 (timestamps > 999999999) or for millisecond precision timestamps ###### Suggested change ∙ *Feature Preview* Handle different timestamp precisions properly: ```python def convert_unix_to_human_readable(unix_timestamp): unix_timestamp = refactor_input(unix_timestamp) unix_timestamp = extract_int_if_possible(unix_timestamp) # Handle milliseconds precision if len(str(unix_timestamp)) > 10: unix_timestamp = int(unix_timestamp) / 1000 else: unix_timestamp = int(unix_timestamp) ``` </details> <sub> [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/34c6a833-236c-47bb-aba8-dc8238f60afe?suggestedFixEnabled=true) 💬 Looking for more details? Reply to this comment to chat with Korbit. </sub> <!--- korbi internal id:75e51403-7ce2-46c1-96c4-fb10fcd5a6b3 --> ########## docker/pythonpath_dev/superset_config.py: ########## @@ -98,7 +98,18 @@ CELERY_CONFIG = CeleryConfig -FEATURE_FLAGS = {"ALERT_REPORTS": True} +FEATURE_FLAGS = {"ALERT_REPORTS": True, "SHOW_DASHBOARD_AGENT_QUERY": False} + +DASHBOARD_AGENTIC_QUERY_CONFIG = { + "model": "gpt-4o", Review Comment: ### Invalid AI Model Name <sub></sub> <details> <summary>Tell me more</summary> ###### What is the issue? The model name 'gpt-4o' appears to be incorrect as it's not a standard OpenAI model name (should probably be 'gpt-4' or 'gpt-3.5-turbo') ###### Why this matters Using an invalid model name will cause API calls to fail, preventing the Agentic Query feature from functioning ###### Suggested change ∙ *Feature Preview* Correct the model name to a valid OpenAI model identifier: ```python "model": "gpt-4", # or gpt-3.5-turbo depending on requirements ``` </details> <sub> [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/9edc9cad-418c-4112-b2cf-e65bd77e27d4?suggestedFixEnabled=true) 💬 Looking for more details? Reply to this comment to chat with Korbit. </sub> <!--- korbi internal id:ce8cd898-5667-43ac-af5e-9f8c19a43a56 --> ########## superset/dashboards/dashboard_agentic_query/get_chart_data.py: ########## @@ -0,0 +1,33 @@ +from superset.dashboards.dashboard_agentic_query.utils import refactor_input, extract_int_if_possible +import json +from superset.charts.data.api import ChartDataRestApi + +def convert_first_k_rows_to_string(chart_data, k=50): + chart_data_str = "" + if(len(chart_data)>k): + chart_data=chart_data[:k] + + for index, row in enumerate(chart_data): + if(index == 0): + chart_data_str = 'S.no\t\t\t' + for key, value in row.items(): + chart_data_str += key + chart_data_str += '\t\t\t' + + chart_data_str += '\n\n' + chart_data_str += str(index+1) + "\t\t\t" + for key, value in row.items(): + chart_data_str += str(value) + chart_data_str += '\t\t\t' + return chart_data_str + +def get_chart_data(chart_id) -> str: + chart_id = refactor_input(chart_id) + chart_id = extract_int_if_possible(chart_id) Review Comment: ### Missing Authorization Check for Chart Data Access <sub></sub> <details> <summary>Tell me more</summary> ###### What is the issue? The chart_id input validation relies on external utility functions without clear validation guarantees. The code doesn't explicitly verify if the user has permission to access the requested chart data. ###### Why this matters Without proper authorization checks, users might be able to access chart data they shouldn't have permission to view. This could lead to data exposure vulnerabilities. ###### Suggested change ∙ *Feature Preview* ```python def get_chart_data(chart_id) -> str: # Add authorization check before processing if not current_user.has_access_to_chart(chart_id): raise PermissionError('User not authorized to access this chart') chart_id = refactor_input(chart_id) chart_id = extract_int_if_possible(chart_id) ``` </details> <sub> [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/69a3ba5c-09bd-4c2a-a266-71cdeac7ebaa?suggestedFixEnabled=true) 💬 Looking for more details? Reply to this comment to chat with Korbit. </sub> <!--- korbi internal id:93353a66-eccc-448f-9b2f-cfbe983d06b7 --> ########## superset/dashboards/dashboard_agentic_query/get_all_charts_name.py: ########## @@ -0,0 +1,20 @@ +from superset.daos.dashboard import DashboardDAO +from superset.dashboards.dashboard_agentic_query.utils import refactor_input, extract_int_if_possible +from superset.charts.schemas import ChartEntityResponseSchema + +def get_charts_list(dashboard_id) -> str: Review Comment: ### Incorrect Return Type Annotation <sub></sub> <details> <summary>Tell me more</summary> ###### What is the issue? The return type annotation is incorrect. The function returns a list of dictionaries, not a string. ###### Why this matters This incorrect type hint could mislead other developers and cause type checking tools to fail. The function actually returns a list of dictionaries containing chart information. ###### Suggested change ∙ *Feature Preview* Correct the return type annotation to List[Dict[str, Union[str, int]]]: ```python from typing import List, Dict, Union def get_charts_list(dashboard_id) -> List[Dict[str, Union[str, int]]]: ``` </details> <sub> [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/ed8882d5-8515-4532-8c0c-d94716242760?suggestedFixEnabled=true) 💬 Looking for more details? Reply to this comment to chat with Korbit. </sub> <!--- korbi internal id:9f03b144-ed3f-43e7-9c50-aec321bc4ffa --> ########## docker/pythonpath_dev/superset_config.py: ########## @@ -98,7 +98,18 @@ class CeleryConfig: CELERY_CONFIG = CeleryConfig -FEATURE_FLAGS = {"ALERT_REPORTS": True} +FEATURE_FLAGS = {"ALERT_REPORTS": True, "SHOW_DASHBOARD_AGENT_QUERY": False} + +DASHBOARD_AGENTIC_QUERY_CONFIG = { + "model": "gpt-4o", + "temperature": 0.8, + "max_tokens": None, + "timeout": None, + "max_retries": 2, + "api_key": "-", + "base_url": "", Review Comment: ### Inconsistent Placeholder Values <sub></sub> <details> <summary>Tell me more</summary> ###### What is the issue? Placeholder values for sensitive configuration use non-standard conventions. ###### Why this matters Using '-' and '' as placeholders is inconsistent and may be confused with valid values. This makes it unclear whether these are intentionally empty or missing values. ###### Suggested change ∙ *Feature Preview* ```python "api_key": None, # Required: Set via environment variable "base_url": None, # Required: Set via environment variable ``` </details> <sub> [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/58857d6f-d9e0-42b9-a3ff-3b640f00f2e4?suggestedFixEnabled=true) 💬 Looking for more details? Reply to this comment to chat with Korbit. </sub> <!--- korbi internal id:795efae5-1b74-4ef1-a4e2-642f6c5ac777 --> ########## superset/dashboards/dashboard_agentic_query/utils.py: ########## @@ -0,0 +1,54 @@ +import json +import ast + +def correct_lst(input_str): + first_open = -1 + last_close = -1 + for i in range(len(input_str)): + if(input_str[i]=='[' and first_open==-1): + first_open = i + elif(input_str[i]==']'): + last_close = i + + return input_str[first_open : last_close + 1] + +def refactor_input(input_str): + input_str = str(input_str) + if('\n' in input_str): + input_str = input_str.split('\n')[0] + input_str = input_str.strip() + if(input_str.startswith("'") or input_str.startswith('"')): + input_str = input_str[1:] + if(input_str.endswith("'") or input_str.endswith('"')): + input_str = input_str[:-1] + return input_str + +def extract_int_if_possible(input_str): Review Comment: ### Misleading Function Name <sub></sub> <details> <summary>Tell me more</summary> ###### What is the issue? Misleading function name as it doesn't actually extract or convert to an integer ###### Why this matters The function name suggests integer extraction but it only performs string splitting, leading to confusion for maintainers. ###### Suggested change ∙ *Feature Preview* ```python def extract_value_after_delimiter(input_str): ``` </details> <sub> [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/b68d9c29-daa1-4e57-8cc2-416bdfc3d266?suggestedFixEnabled=true) 💬 Looking for more details? Reply to this comment to chat with Korbit. </sub> <!--- korbi internal id:7343bb27-112c-47fa-8d18-a55d33c36ce5 --> ########## superset/dashboards/dashboard_agentic_query/dashboard_agentic_query.py: ########## @@ -0,0 +1,99 @@ +from langchain.agents import Tool, ZeroShotAgent, AgentExecutor +from langchain_openai import ChatOpenAI +from langchain import LLMChain +from langchain.prompts import PromptTemplate +import time + +import urllib3 +from superset.dashboards.dashboard_agentic_query.get_all_charts_name import get_charts_list +from superset.dashboards.dashboard_agentic_query.get_chart_data import get_chart_data +from superset.dashboards.dashboard_agentic_query.unix_timestamp_to_human_readable import convert_unix_to_human_readable +from flask import current_app + +urllib3.disable_warnings(category=urllib3.exceptions.InsecureRequestWarning) + +DASHBOARD_AGENTIC_QUERY_CONFIG = current_app.config["DASHBOARD_AGENTIC_QUERY_CONFIG"] + +mrkl_template_value = ''' Review Comment: ### Unclear Variable Name <sub></sub> <details> <summary>Tell me more</summary> ###### What is the issue? The variable name 'mrkl_template_value' is unclear and uses an unexplained abbreviation. ###### Why this matters Cryptic variable names force readers to research or guess their meaning, slowing down code comprehension. ###### Suggested change ∙ *Feature Preview* ```python agent_prompt_template = ''' ``` </details> <sub> [](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/11337ec3-aeac-4ae6-8b01-f6f447de4e67?suggestedFixEnabled=true) 💬 Looking for more details? Reply to this comment to chat with Korbit. </sub> <!--- korbi internal id:201b5b7b-c4c2-4cc5-8e90-a17226118c13 --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
