codeant-ai-for-open-source[bot] commented on code in PR #41264:
URL: https://github.com/apache/superset/pull/41264#discussion_r3447628562
##########
.cursor/hooks/audit-tool-calls.sh:
##########
@@ -0,0 +1,60 @@
+#!/bin/bash
+#
+# Cursor audit hook: append agent tool call details to JSONL logs under
+# .cursor/audit-logs/. Intended for postToolUse and postToolUseFailure.
+#
+# Always exits 0 so auditing never blocks the agent.
+
+input=$(cat)
+
+log_dir=".cursor/audit-logs"
+mkdir -p "$log_dir"
+
+date_str=$(date -u +"%Y-%m-%d")
+log_file="${log_dir}/${date_str}.jsonl"
+timestamp=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
+
+event=$(echo "$input" | jq -r '.hook_event_name // "unknown"')
+
+case "$event" in
+ postToolUse)
+ status="success"
+ ;;
+ postToolUseFailure)
+ status=$(echo "$input" | jq -r '.failure_type // "failure"')
+ ;;
+ *)
+ status="$event"
+ ;;
+esac
+
+# Cap large tool output to keep log files manageable.
+echo "$input" | jq -c \
+ --arg ts "$timestamp" \
+ --arg status "$status" \
+ '{
+ timestamp: $ts,
+ status: $status,
+ hook_event_name: .hook_event_name,
+ conversation_id: (.conversation_id // null),
+ generation_id: (.generation_id // null),
+ tool_name: (.tool_name // null),
+ tool_use_id: (.tool_use_id // null),
+ cwd: (.cwd // null),
+ duration_ms: (.duration // null),
+ model: (.model // null),
+ user_email: (.user_email // null),
+ tool_input: (.tool_input // null),
+ tool_output: (
+ if (.tool_output | type) == "string" and (.tool_output | length) > 8000
then
+ .tool_output[0:8000] + "...[truncated]"
+ else
+ .tool_output // null
Review Comment:
**Suggestion:** The audit record stores raw `tool_input`, `tool_output`, and
`user_email` without redaction, which can persist credentials, tokens, personal
data, and query results in plaintext logs. Add redaction/masking for sensitive
fields before writing to disk and avoid logging full payloads by default.
[security]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ Audit logs persist user emails and tool payload secrets.
- ⚠️ Local developers risk leaking credentials via synced logs.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Trigger any Cursor tool call so that the audit hook
`.cursor/hooks/audit-tool-calls.sh`
runs with a payload containing `user_email`, `tool_input`, and `tool_output`
fields (these
are read and logged at lines 46–52).
2. For a concrete test, create a file `sensitive-event.json` with JSON like
`{"hook_event_name":"postToolUse","user_email":"[email protected]","tool_input":{"password":"secret123"},"tool_output":"API
token: abc123"}` and pipe it to the hook: `cat sensitive-event.json |
.cursor/hooks/audit-tool-calls.sh`.
3. The hook reads stdin into `input` (line 8), computes metadata (lines
13–29), and then
runs `jq -c` at lines 32–58, embedding `.user_email`, `.tool_input` and
`.tool_output`
directly into the log record without masking (lines 46–52).
4. Open `.cursor/audit-logs/<YYYY-MM-DD>.jsonl` and observe that the
resulting JSONL entry
contains the full `user_email`, the entire `tool_input` object (including
`password`), and
the `tool_output` string (including `API token: abc123`), demonstrating that
sensitive
user and credential data are persisted in plaintext audit logs.
```
</details>
[](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=19e4d5be418e46d59d3beeda38123f7a&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
[](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=19e4d5be418e46d59d3beeda38123f7a&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** .cursor/hooks/audit-tool-calls.sh
**Line:** 46:52
**Comment:**
*Security: The audit record stores raw `tool_input`, `tool_output`, and
`user_email` without redaction, which can persist credentials, tokens, personal
data, and query results in plaintext logs. Add redaction/masking for sensitive
fields before writing to disk and avoid logging full payloads by default.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41264&comment_hash=fb6799f001a1e26511486452c66a59fd436fb5ed49d2d69374f00b5feb7a4783&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41264&comment_hash=fb6799f001a1e26511486452c66a59fd436fb5ed49d2d69374f00b5feb7a4783&reaction=dislike'>👎</a>
##########
.cursor/hooks/block-dangerous-commands.sh:
##########
@@ -0,0 +1,105 @@
+#!/bin/bash
+#
+# Cursor hook: blocks database reads and user-directory lookups, then instructs
+# the agent to refuse the request entirely (no workarounds).
+#
+# Used by beforeShellExecution and preToolUse. Reads hook event JSON from stdin
+# and returns:
+# - {"permission":"deny", ...} when access is blocked
+# - {"permission":"allow"} otherwise
+
+input=$(cat)
+
+REFUSAL_USER_MSG="Blocked by project hook: database and user-directory access
is not allowed."
+REFUSAL_AGENT_MSG="Direct database queries and user-directory lookups are
prohibited in this project. Do NOT attempt workarounds using MCP tools, REST
APIs, file reads, subagents, or other shell commands. Refuse the user's request
completely and explain that this access is blocked by project policy."
+
+deny() {
+ local reason="$1"
+ jq -n \
+ --arg um "$REFUSAL_USER_MSG" \
+ --arg am "${REFUSAL_AGENT_MSG} (Blocked: ${reason})" \
+ '{permission: "deny", user_message: $um, agent_message: $am}'
+ exit 0
+}
+
+allow() {
+ echo '{ "permission": "allow" }'
+ exit 0
+}
+
+# Database read patterns for shell commands (extended regex, case-insensitive).
+db_shell_patterns=(
+ '\bSELECT\b'
+ '\bSHOW\b'
+ '\bDESCRIBE\b'
+ '\bEXPLAIN\b'
+ 'psql[[:space:]]'
+ 'mysql[[:space:]]'
+ 'sqlite3[[:space:]]'
+)
+
+# REST/API workarounds for listing users without SQL keywords.
+user_api_patterns=(
+ 'security/users'
+ 'api/v1/security/users'
+)
+
+matches_db_shell() {
+ local command="$1"
+ local pattern
+ for pattern in "${db_shell_patterns[@]}"; do
+ if echo "$command" | grep -Eiq "$pattern"; then
+ echo "shell database read (${pattern})"
+ return 0
+ fi
+ done
+ return 1
+}
+
+matches_user_api() {
+ local command="$1"
+ local pattern
+ for pattern in "${user_api_patterns[@]}"; do
+ if echo "$command" | grep -Eiq "$pattern"; then
+ echo "user directory API (${pattern})"
+ return 0
+ fi
+ done
+ return 1
+}
+
+tool_name=$(echo "$input" | jq -r '.tool_name // empty')
+command=$(echo "$input" | jq -r '.command // .tool_input.command // empty')
+
+# preToolUse: block MCP workarounds and inspect Shell tool input.
+if [[ -n "$tool_name" ]]; then
+ if echo "$tool_name" | grep -Eiq 'find_users|execute_sql'; then
+ deny "MCP tool ${tool_name}"
+ fi
+
+ if [[ "$tool_name" == "Shell" && -n "$command" ]]; then
+ if reason=$(matches_db_shell "$command"); then
+ deny "$reason"
+ fi
+ if reason=$(matches_user_api "$command"); then
+ deny "$reason"
+ fi
+ fi
+
+ allow
+fi
+
+# beforeShellExecution: inspect the proposed shell command directly.
+if [[ -z "$command" ]]; then
+ allow
Review Comment:
**Suggestion:** The `jq` query assumes `tool_input` is always an object with
a `command` field; if it is another type, `jq` errors and `command` becomes
empty, and the script then auto-allows. This creates a silent bypass path
whenever event payload shape differs. Use safe optional access/type checks
before reading nested fields and treat parse failures as deny when `failClosed`
behavior is expected. [api mismatch]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Malformed hook payloads bypass database/user-directory enforcement.
- ⚠️ Future schema changes silently disable safety checks.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Create a malformed or unexpected payload file `bad-shell-event.json`
containing JSON
without a `command` or `tool_input.command` field, for example
`{"hook_event_name":"beforeShellExecution","some_other_field":"psql -c
\"SELECT 1\""};`
this will be read into `input` at line 11 in
`.cursor/hooks/block-dangerous-commands.sh`.
2. Run `cat bad-shell-event.json |
.cursor/hooks/block-dangerous-commands.sh`; the script
invokes `jq -r '.command // .tool_input.command // empty'` at line 72, which
yields an
empty string because none of the expected fields are present, and assigns
this empty value
to the `command` variable.
3. Since there is no `tool_name` field, `tool_name` stays empty and the
preToolUse branch
at lines 75–89 is skipped; execution reaches the beforeShellExecution
section at lines
92–105.
4. The beforeShellExecution logic checks `if [[ -z "$command" ]]; then` at
line 93 and
immediately calls `allow` at line 94, returning `{"permission":"allow"}`
even though the
payload was malformed and the script could not inspect the actual shell
command, showing
that unexpected payload shapes lead to a fail-open decision instead of
denying by default.
```
</details>
[](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=7a65d92aba914107a03d0968e4a294c1&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
[](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=7a65d92aba914107a03d0968e4a294c1&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** .cursor/hooks/block-dangerous-commands.sh
**Line:** 72:94
**Comment:**
*Api Mismatch: The `jq` query assumes `tool_input` is always an object
with a `command` field; if it is another type, `jq` errors and `command`
becomes empty, and the script then auto-allows. This creates a silent bypass
path whenever event payload shape differs. Use safe optional access/type checks
before reading nested fields and treat parse failures as deny when `failClosed`
behavior is expected.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41264&comment_hash=5fb2462b03f19d44813e34b3c9fecee32fe9b730b77c63a5bd123ccded745574&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41264&comment_hash=5fb2462b03f19d44813e34b3c9fecee32fe9b730b77c63a5bd123ccded745574&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]