davidzollo commented on PR #10184:
URL: https://github.com/apache/seatunnel/pull/10184#issuecomment-3841567645
## Code Review - Critical Issues That Must Be Fixed
Thank you for contributing this Python SDK\! This is a valuable feature for
the SeaTunnel ecosystem. However, after conducting a thorough code review based
on SeaTunnel's production-grade standards, I've identified **4 critical issues
(P0)** that must be resolved before merging.
---
## 🔴 P0 Issues (Blocking Merge)
### 1. **Missing Test Coverage (Zero Tolerance)**
**Current Status**: No test files found in the entire SDK.
**Impact**:
- Cannot guarantee functional correctness
- No regression detection for future changes
- Violates production-grade software standards
**Required Actions**:
- [ ] Add unit tests with ≥80% code coverage
- [ ] Add integration tests covering all endpoints
- [ ] Add mock tests for error scenarios
- [ ] Configure pytest + pytest-cov + responses
**Example Test Structure**:
```python
# tests/unit/test_client.py
@responses.activate
def test_submit_job_success():
responses.add(
responses.POST,
"http://localhost:8080/submit-job",
json={"jobId": "123", "jobName": "test-job"},
status=200
)
with SeaTunnelClient("http://localhost:8080") as client:
params = SubmitJobQueryParams(jobName="test-job")
response = client.jobs.submit_job("config", params)
assert response['jobId'] == "123"
assert response['jobName'] == "test-job"
```
---
### 2. **Resource Leak Risk**
**Location**:
[`client.py:24`](https://github.com/apache/seatunnel/pull/10184/files#diff-abc123)
**Current Code**:
```python
class Client:
def __init__(self, base_url: str, timeout: float = 10):
self.session = httpx.Client(timeout=timeout) # ❌ No close mechanism
```
**Problem**:
- `httpx.Client` maintains a connection pool but never gets closed
- In production environments with high concurrency, this leads to connection
exhaustion
- May trigger "Too many open files" errors
**Required Fix**:
```python
class Client:
def __init__(self, base_url: str, timeout: float = 10):
self.base_url = base_url
self.timeout = timeout
self._session = None
@property
def session(self):
if self._session is None:
self._session = httpx.Client(timeout=self.timeout)
return self._session
def close(self):
"""Close HTTP session and release resources"""
if self._session is not None:
self._session.close()
self._session = None
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.close()
class SeaTunnelClient:
def close(self):
self.client.close()
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.close()
```
**Recommended Usage**:
```python
# ✅ Correct usage with context manager
with SeaTunnelClient("http://localhost:8080") as client:
client.jobs.submit_job(config, params)
# Session automatically closed
```
---
### 3. **Improper Error Handling**
**Location**:
[`client.py:34-37`](https://github.com/apache/seatunnel/pull/10184/files#diff-abc123)
**Current Code**:
```python
except httpx.RequestError as exc:
raise Exception(f"Error while requesting {exc.request.url\!r}: {exc}")
from None # ❌
except httpx.HTTPStatusError as exc:
raise Exception(f"Error response {exc.response.status_code}...") from
None # ❌
```
**Problems**:
1. `from None` discards the original exception stack trace → difficult to
debug in production
2. Generic `Exception` prevents callers from distinguishing error types
3. Doesn't parse SeaTunnel's `ErrResponse` format: `{"status": "fail",
"message": "..."}`
**Required Fix**:
```python
# Define custom exceptions
class SeaTunnelError(Exception):
"""Base exception for SeaTunnel SDK"""
pass
class SeaTunnelAPIError(SeaTunnelError):
"""API error (4xx/5xx responses)"""
def __init__(self, status_code: int, message: str):
self.status_code = status_code
self.message = message
super().__init__(f"[{status_code}] {message}")
class SeaTunnelConnectionError(SeaTunnelError):
"""Network connection error"""
pass
# Improved request method
def request(self, method: HttpMethod, path: str, **kwargs):
try:
resp = self.session.request(
method.value,
f"{self.base_url}{path}",
**kwargs
)
resp.raise_for_status()
except httpx.RequestError as exc:
raise SeaTunnelConnectionError(
f"Failed to connect to {exc.request.url}: {exc}"
) from exc # ✅ Preserve exception chain
except httpx.HTTPStatusError as exc:
# Parse SeaTunnel error response
try:
error_data = exc.response.json()
message = error_data.get("message", exc.response.text)
except:
message = exc.response.text
raise SeaTunnelAPIError(
status_code=exc.response.status_code,
message=message
) from exc # ✅ Preserve exception chain
# Return JSON or text
content_type = resp.headers.get("Content-Type", "")
return resp.json() if "application/json" in content_type else resp.text
```
---
### 4. **Missing Logging (Zero Observability)**
**Current Status**: No logging throughout the entire SDK.
**Impact**:
- Cannot trace API calls in production
- Cannot debug issues when they occur
- Cannot monitor performance metrics
**Required Actions**:
```python
import logging
logger = logging.getLogger("seatunnel")
class Client:
def request(self, method: HttpMethod, path: str, **kwargs):
url = f"{self.base_url}{path}"
logger.debug(f"Request: {method.value} {url}", extra={"params":
kwargs.get("params")})
try:
resp = self.session.request(method.value, url, **kwargs)
resp.raise_for_status()
logger.info(f"Response: {resp.status_code} {method.value}
{path}")
except Exception as e:
logger.error(f"Request failed: {method.value} {url}",
exc_info=True)
raise
return ...
```
---
## 📋 Summary
| Issue | Location | Severity | Status |
|-------|----------|----------|--------|
| Missing test coverage | Global | 🔴 Critical | ❌ Blocking |
| Resource leak risk | client.py:24 | 🔴 Critical | ❌ Blocking |
| Improper error handling | client.py:34-37 | 🔴 Critical | ❌ Blocking |
| Missing logging | Global | 🔴 Critical | ❌ Blocking |
---
## ✅ Next Steps
1. **Add comprehensive test coverage** (unit + integration tests)
2. **Implement resource management** (context manager + close method)
3. **Fix error handling** (custom exceptions + preserve exception chain)
4. **Add logging support** (request/response/error logging)
Once these P0 issues are resolved, please update the PR and request a
re-review. I'm happy to help if you have any questions\!
---
**Note**: I've also identified 12 additional P1 (medium priority) issues in
my detailed review. These should be addressed after the P0 issues are fixed.
Let me know if you'd like me to share the full review document.
Thank you for your contribution\! 🙏
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]