codeant-ai-for-open-source[bot] commented on code in PR #41264:
URL: https://github.com/apache/superset/pull/41264#discussion_r3447628562


##########
.cursor/hooks/audit-tool-calls.sh:
##########
@@ -0,0 +1,60 @@
+#!/bin/bash
+#
+# Cursor audit hook: append agent tool call details to JSONL logs under
+# .cursor/audit-logs/. Intended for postToolUse and postToolUseFailure.
+#
+# Always exits 0 so auditing never blocks the agent.
+
+input=$(cat)
+
+log_dir=".cursor/audit-logs"
+mkdir -p "$log_dir"
+
+date_str=$(date -u +"%Y-%m-%d")
+log_file="${log_dir}/${date_str}.jsonl"
+timestamp=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
+
+event=$(echo "$input" | jq -r '.hook_event_name // "unknown"')
+
+case "$event" in
+  postToolUse)
+    status="success"
+    ;;
+  postToolUseFailure)
+    status=$(echo "$input" | jq -r '.failure_type // "failure"')
+    ;;
+  *)
+    status="$event"
+    ;;
+esac
+
+# Cap large tool output to keep log files manageable.
+echo "$input" | jq -c \
+  --arg ts "$timestamp" \
+  --arg status "$status" \
+  '{
+    timestamp: $ts,
+    status: $status,
+    hook_event_name: .hook_event_name,
+    conversation_id: (.conversation_id // null),
+    generation_id: (.generation_id // null),
+    tool_name: (.tool_name // null),
+    tool_use_id: (.tool_use_id // null),
+    cwd: (.cwd // null),
+    duration_ms: (.duration // null),
+    model: (.model // null),
+    user_email: (.user_email // null),
+    tool_input: (.tool_input // null),
+    tool_output: (
+      if (.tool_output | type) == "string" and (.tool_output | length) > 8000 
then
+        .tool_output[0:8000] + "...[truncated]"
+      else
+        .tool_output // null

Review Comment:
   **Suggestion:** The audit record stores raw `tool_input`, `tool_output`, and 
`user_email` without redaction, which can persist credentials, tokens, personal 
data, and query results in plaintext logs. Add redaction/masking for sensitive 
fields before writing to disk and avoid logging full payloads by default. 
[security]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Audit logs persist user emails and tool payload secrets.
   - ⚠️ Local developers risk leaking credentials via synced logs.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Trigger any Cursor tool call so that the audit hook 
`.cursor/hooks/audit-tool-calls.sh`
   runs with a payload containing `user_email`, `tool_input`, and `tool_output` 
fields (these
   are read and logged at lines 46–52).
   
   2. For a concrete test, create a file `sensitive-event.json` with JSON like
   
`{"hook_event_name":"postToolUse","user_email":"[email protected]","tool_input":{"password":"secret123"},"tool_output":"API
   token: abc123"}` and pipe it to the hook: `cat sensitive-event.json |
   .cursor/hooks/audit-tool-calls.sh`.
   
   3. The hook reads stdin into `input` (line 8), computes metadata (lines 
13–29), and then
   runs `jq -c` at lines 32–58, embedding `.user_email`, `.tool_input` and 
`.tool_output`
   directly into the log record without masking (lines 46–52).
   
   4. Open `.cursor/audit-logs/<YYYY-MM-DD>.jsonl` and observe that the 
resulting JSONL entry
   contains the full `user_email`, the entire `tool_input` object (including 
`password`), and
   the `tool_output` string (including `API token: abc123`), demonstrating that 
sensitive
   user and credential data are persisted in plaintext audit logs.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=19e4d5be418e46d59d3beeda38123f7a&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=19e4d5be418e46d59d3beeda38123f7a&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** .cursor/hooks/audit-tool-calls.sh
   **Line:** 46:52
   **Comment:**
        *Security: The audit record stores raw `tool_input`, `tool_output`, and 
`user_email` without redaction, which can persist credentials, tokens, personal 
data, and query results in plaintext logs. Add redaction/masking for sensitive 
fields before writing to disk and avoid logging full payloads by default.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41264&comment_hash=fb6799f001a1e26511486452c66a59fd436fb5ed49d2d69374f00b5feb7a4783&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41264&comment_hash=fb6799f001a1e26511486452c66a59fd436fb5ed49d2d69374f00b5feb7a4783&reaction=dislike'>👎</a>



##########
.cursor/hooks/block-dangerous-commands.sh:
##########
@@ -0,0 +1,105 @@
+#!/bin/bash
+#
+# Cursor hook: blocks database reads and user-directory lookups, then instructs
+# the agent to refuse the request entirely (no workarounds).
+#
+# Used by beforeShellExecution and preToolUse. Reads hook event JSON from stdin
+# and returns:
+#   - {"permission":"deny", ...} when access is blocked
+#   - {"permission":"allow"}     otherwise
+
+input=$(cat)
+
+REFUSAL_USER_MSG="Blocked by project hook: database and user-directory access 
is not allowed."
+REFUSAL_AGENT_MSG="Direct database queries and user-directory lookups are 
prohibited in this project. Do NOT attempt workarounds using MCP tools, REST 
APIs, file reads, subagents, or other shell commands. Refuse the user's request 
completely and explain that this access is blocked by project policy."
+
+deny() {
+  local reason="$1"
+  jq -n \
+    --arg um "$REFUSAL_USER_MSG" \
+    --arg am "${REFUSAL_AGENT_MSG} (Blocked: ${reason})" \
+    '{permission: "deny", user_message: $um, agent_message: $am}'
+  exit 0
+}
+
+allow() {
+  echo '{ "permission": "allow" }'
+  exit 0
+}
+
+# Database read patterns for shell commands (extended regex, case-insensitive).
+db_shell_patterns=(
+  '\bSELECT\b'
+  '\bSHOW\b'
+  '\bDESCRIBE\b'
+  '\bEXPLAIN\b'
+  'psql[[:space:]]'
+  'mysql[[:space:]]'
+  'sqlite3[[:space:]]'
+)
+
+# REST/API workarounds for listing users without SQL keywords.
+user_api_patterns=(
+  'security/users'
+  'api/v1/security/users'
+)
+
+matches_db_shell() {
+  local command="$1"
+  local pattern
+  for pattern in "${db_shell_patterns[@]}"; do
+    if echo "$command" | grep -Eiq "$pattern"; then
+      echo "shell database read (${pattern})"
+      return 0
+    fi
+  done
+  return 1
+}
+
+matches_user_api() {
+  local command="$1"
+  local pattern
+  for pattern in "${user_api_patterns[@]}"; do
+    if echo "$command" | grep -Eiq "$pattern"; then
+      echo "user directory API (${pattern})"
+      return 0
+    fi
+  done
+  return 1
+}
+
+tool_name=$(echo "$input" | jq -r '.tool_name // empty')
+command=$(echo "$input" | jq -r '.command // .tool_input.command // empty')
+
+# preToolUse: block MCP workarounds and inspect Shell tool input.
+if [[ -n "$tool_name" ]]; then
+  if echo "$tool_name" | grep -Eiq 'find_users|execute_sql'; then
+    deny "MCP tool ${tool_name}"
+  fi
+
+  if [[ "$tool_name" == "Shell" && -n "$command" ]]; then
+    if reason=$(matches_db_shell "$command"); then
+      deny "$reason"
+    fi
+    if reason=$(matches_user_api "$command"); then
+      deny "$reason"
+    fi
+  fi
+
+  allow
+fi
+
+# beforeShellExecution: inspect the proposed shell command directly.
+if [[ -z "$command" ]]; then
+  allow

Review Comment:
   **Suggestion:** The `jq` query assumes `tool_input` is always an object with 
a `command` field; if it is another type, `jq` errors and `command` becomes 
empty, and the script then auto-allows. This creates a silent bypass path 
whenever event payload shape differs. Use safe optional access/type checks 
before reading nested fields and treat parse failures as deny when `failClosed` 
behavior is expected. [api mismatch]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Malformed hook payloads bypass database/user-directory enforcement.
   - ⚠️ Future schema changes silently disable safety checks.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Create a malformed or unexpected payload file `bad-shell-event.json` 
containing JSON
   without a `command` or `tool_input.command` field, for example
   `{"hook_event_name":"beforeShellExecution","some_other_field":"psql -c 
\"SELECT 1\""};`
   this will be read into `input` at line 11 in 
`.cursor/hooks/block-dangerous-commands.sh`.
   
   2. Run `cat bad-shell-event.json | 
.cursor/hooks/block-dangerous-commands.sh`; the script
   invokes `jq -r '.command // .tool_input.command // empty'` at line 72, which 
yields an
   empty string because none of the expected fields are present, and assigns 
this empty value
   to the `command` variable.
   
   3. Since there is no `tool_name` field, `tool_name` stays empty and the 
preToolUse branch
   at lines 75–89 is skipped; execution reaches the beforeShellExecution 
section at lines
   92–105.
   
   4. The beforeShellExecution logic checks `if [[ -z "$command" ]]; then` at 
line 93 and
   immediately calls `allow` at line 94, returning `{"permission":"allow"}` 
even though the
   payload was malformed and the script could not inspect the actual shell 
command, showing
   that unexpected payload shapes lead to a fail-open decision instead of 
denying by default.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=7a65d92aba914107a03d0968e4a294c1&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=7a65d92aba914107a03d0968e4a294c1&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** .cursor/hooks/block-dangerous-commands.sh
   **Line:** 72:94
   **Comment:**
        *Api Mismatch: The `jq` query assumes `tool_input` is always an object 
with a `command` field; if it is another type, `jq` errors and `command` 
becomes empty, and the script then auto-allows. This creates a silent bypass 
path whenever event payload shape differs. Use safe optional access/type checks 
before reading nested fields and treat parse failures as deny when `failClosed` 
behavior is expected.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41264&comment_hash=5fb2462b03f19d44813e34b3c9fecee32fe9b730b77c63a5bd123ccded745574&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41264&comment_hash=5fb2462b03f19d44813e34b3c9fecee32fe9b730b77c63a5bd123ccded745574&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to