codeant-ai-for-open-source[bot] commented on code in PR #36821:
URL: https://github.com/apache/superset/pull/36821#discussion_r2644899051
##########
docker/docker-bootstrap.sh:
##########
@@ -42,7 +42,7 @@ if [ "$CYPRESS_CONFIG" == "true" ]; then
PORT=8081
fi
# Skip postgres requirements installation for workers to avoid conflicts
-if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(whoami)" = "root" ] && [
"$1" != "worker" ] && [ "$1" != "beat" ]; then
+if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(whoami)" = "root" ] && [
"$1" != "worker" ] && [ "$1" != "beat" ] && [ "$SUPERSET_ENV" != "production"
]; then
Review Comment:
**Suggestion:** Using `whoami` in a command substitution can fail on minimal
images (or be unavailable); replace it with a numeric UID check (`id -u`) which
is more robust and avoids depending on `whoami`. [possible bug]
**Severity Level:** Critical 🚨
```suggestion
if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(id -u)" -eq 0 ] && [ "$1"
!= "worker" ] && [ "$1" != "beat" ] && [ "$SUPERSET_ENV" != "production" ]; then
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
Replacing whoami with id -u -eq 0 is more robust in minimal containers and
POSIX-y. It's a sensible hardening to avoid runtime failure when whoami isn't
present. Be mindful to use the same approach consistently across the script.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** docker/docker-bootstrap.sh
**Line:** 45:45
**Comment:**
*Possible Bug: Using `whoami` in a command substitution can fail on
minimal images (or be unavailable); replace it with a numeric UID check (`id
-u`) which is more robust and avoids depending on `whoami`.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
debug_hive.py:
##########
@@ -0,0 +1,44 @@
+import socket
+import sys
+
+host = 'hive-server'
+port = 10000
+
+print(f"Testing connectivity to {host}:{port}...")
+
+# Test 1: DNS Resolution
+try:
+ ip = socket.gethostbyname(host)
+ print(f"DNS Resolved: {host} -> {ip}")
+except Exception as e:
+ print(f"DNS Failed: {e}")
+ sys.exit(1)
+
+# Test 2: TCP Socket
+try:
+ s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+ s.settimeout(5)
+ s.connect((ip, port))
+ print(f"TCP Connection Successful to {ip}:{port}")
+ s.close()
Review Comment:
**Suggestion:** Resource leak: the TCP socket `s` is only closed on the
success path; if `s.connect()` raises an exception the socket remains open. Use
a context manager or ensure `s.close()` runs in a finally block so the socket
is always closed. [resource leak]
**Severity Level:** Minor ⚠️
```suggestion
# create_connection handles DNS/IPv4/IPv6 resolution and returns a
socket that supports context manager
with socket.create_connection((ip, port), timeout=5) as s:
print(f"TCP Connection Successful to {ip}:{port}")
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
This is a valid resource-leak bug: if connect() raises, the socket object
may be left open. The improved code uses socket.create_connection within a
context manager which ensures the socket is closed on all paths and simplifies
timeout handling. It's a straightforward correctness fix.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** debug_hive.py
**Line:** 19:23
**Comment:**
*Resource Leak: Resource leak: the TCP socket `s` is only closed on the
success path; if `s.connect()` raises an exception the socket remains open. Use
a context manager or ensure `s.close()` runs in a finally block so the socket
is always closed.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
docker/Dockerfile.hive:
##########
@@ -0,0 +1,16 @@
+FROM apache/superset:latest
Review Comment:
**Suggestion:** Using the unpinned image tag `:latest` makes builds
non-reproducible and can introduce breaking or insecure changes; expose a build
argument so callers can pin the Superset base image/version. [possible bug]
**Severity Level:** Critical 🚨
```suggestion
ARG SUPERSET_VERSION=latest
FROM apache/superset:${SUPERSET_VERSION}
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
Using :latest makes image builds non-reproducible and can silently pull
breaking or insecure updates. Exposing a build ARG to allow pinning the
Superset base image is a minimal, low-risk improvement that helps reproducible
builds.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** docker/Dockerfile.hive
**Line:** 1:1
**Comment:**
*Possible Bug: Using the unpinned image tag `:latest` makes builds
non-reproducible and can introduce breaking or insecure changes; expose a build
argument so callers can pin the Superset base image/version.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
docker/Dockerfile.hive:
##########
@@ -0,0 +1,16 @@
+FROM apache/superset:latest
+
+USER root
+
+# Install system dependencies for thrift-sasl
+RUN apt-get update -q \
+ && apt-get install -yq --no-install-recommends \
+ libsasl2-dev \
+ build-essential \
+ libpq-dev \
+ && rm -rf /var/lib/apt/lists/*
+
+# Install python dependencies inside the virtual environment using uv
+RUN uv pip install --system --python /app/.venv pyhive thrift thrift-sasl
psycopg2-binary
Review Comment:
**Suggestion:** The command `uv pip ...` is invalid/unknown in the base
image and will cause the RUN step to fail; also installing into a virtualenv
path that may not exist will break the build. Use the venv's pip if present,
otherwise fall back to the system pip, and add --no-cache-dir to avoid leaving
pip cache in the layer. [possible bug]
**Severity Level:** Critical 🚨
```suggestion
RUN if [ -x /app/.venv/bin/pip ]; then \
/app/.venv/bin/pip install --no-cache-dir pyhive thrift thrift-sasl
psycopg2-binary; \
else \
python3 -m pip install --no-cache-dir pyhive thrift thrift-sasl
psycopg2-binary; \
fi
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
The RUN line in the PR uses "uv pip ..." which looks wrong for the
apache/superset image — "uv" is not a standard wrapper and will likely make the
build fail. The proposed fallback to use the venv pip if present or python3 -m
pip is sensible and the addition of --no-cache-dir avoids leaving pip cache in
the layer. This directly fixes a build-time break.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** docker/Dockerfile.hive
**Line:** 14:14
**Comment:**
*Possible Bug: The command `uv pip ...` is invalid/unknown in the base
image and will cause the RUN step to fail; also installing into a virtualenv
path that may not exist will break the build. Use the venv's pip if present,
otherwise fall back to the system pip, and add --no-cache-dir to avoid leaving
pip cache in the layer.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
docker/docker-init.sh:
##########
@@ -16,6 +16,8 @@
# limitations under the License.
#
set -e
+exec > >(tee -a /app/docker/debug.log) 2>&1
Review Comment:
**Suggestion:** Robustness issue: if /app/docker doesn't exist or isn't
writable, the process-substitution tee can fail or exit and break
logging/output; check that the directory/file are creatable and writable and
fall back to console-only logging when not writable. [possible bug]
**Severity Level:** Critical 🚨
```suggestion
if mkdir -p /app/docker && touch /app/docker/debug.log && [ -w
/app/docker/debug.log ]; then
exec > >(tee -a /app/docker/debug.log) 2>&1
else
echo "Warning: cannot write to /app/docker/debug.log, continuing with
stdout/stderr only"
fi
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
The improved code safely checks that the directory/file exist and are
writable and falls back to console logging if not — that's a practical
robustness improvement.
Note: in bash with set -e, commands inside the if condition won't abort the
script, so this construct is safe; you may still want to combine this with
permission setting or clear logging policy for mounted volumes.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** docker/docker-init.sh
**Line:** 19:19
**Comment:**
*Possible Bug: Robustness issue: if /app/docker doesn't exist or isn't
writable, the process-substitution tee can fail or exit and break
logging/output; check that the directory/file are creatable and writable and
fall back to console-only logging when not writable.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]