asf-tooling commented on issue #641:
URL:
https://github.com/apache/tooling-trusted-releases/issues/641#issuecomment-4410118413
<!-- gofannon-issue-triage-bot v2 -->
**Automated triage** — analyzed at `main@2da7807a`
**Type:** `refactor` • **Classification:** `actionable` •
**Confidence:** `medium`
**Application domain(s):** `shared_infrastructure`
### Summary
The issue asks to reduce the size and heterogeneity of `atr/util.py` by
extracting coherent subsets into dedicated modules. The prior discussion
identified themes: email methods, file system methods, session methods, and
path methods (some already moved to `atr/paths.py` per @alitheg). @sbp
suggested extracting the largest possible subset first and doing this
periodically. @dave2wave proposed `email`, `filesystem`, `session`, and
additional `path` methods as candidate groupings. The code remains large and
heterogeneous today—clearly actionable.
### Where this lives in the code today
#### `atr/util.py` — `email_from_uid` (lines 405-410)
_needs modification_
Email-parsing helper that belongs in a dedicated email utilities module.
```python
def email_from_uid(uid: str) -> str | None:
if m := re.search(r"<([^>]+)>", uid):
return m.group(1).lower()
elif m := re.search(r"^([^@ ]+)@apache.org$", uid):
return uid
return None
```
#### `atr/util.py` — `asf_uid_from_email` (lines 171-176)
_needs modification_
LDAP email lookup, part of the email/identity theme identified by @dave2wave.
```python
def asf_uid_from_email(email: str) -> str | None:
ldap_params = ldap.SearchParameters(email_query=email)
ldap.search(ldap_params)
if not (ldap_params.results_list and ldap_params.results_list[0].uid):
return None
return ldap_params.results_list[0].uid[0]
```
#### `atr/util.py` — `atomic_write_file` (lines 242-255)
_needs modification_
Filesystem utility function, part of the filesystem theme identified by
@dave2wave.
```python
async def atomic_write_file(file_path: pathlib.Path, content: str, encoding:
str = "utf-8") -> None:
"""Atomically write content to a file using a temporary file."""
await aiofiles.os.makedirs(file_path.parent, exist_ok=True)
temp_path = file_path.parent / f".{file_path.name}.{uuid.uuid4()}.tmp"
try:
async with aiofiles.open(temp_path, "w", encoding=encoding) as f:
await f.write(content)
await f.flush()
await asyncio.to_thread(os.fsync, f.fileno())
await aiofiles.os.rename(temp_path, file_path)
except Exception:
with contextlib.suppress(FileNotFoundError):
await aiofiles.os.remove(temp_path)
raise
```
#### `atr/util.py` — `create_secure_session` (lines 373-379)
_needs modification_
Session/TLS creation function, part of the session theme identified by
@dave2wave.
```python
def create_secure_session(
timeout: aiohttp.ClientTimeout | None = None,
) -> aiohttp.ClientSession:
"""Create a secure aiohttp.ClientSession with hardened SSL/TLS
configuration."""
connector = aiohttp.TCPConnector(ssl=create_secure_ssl_context())
# We pass the timeout to the ClientSession constructor
return aiohttp.ClientSession(connector=connector, timeout=timeout)
```
#### `atr/util.py` — `key_ssh_fingerprint_core` (lines 651-672)
_needs modification_
SSH key utility that could live in a dedicated keys/crypto module.
```python
def key_ssh_fingerprint_core(ssh_key_string: str) -> str:
# The format should be as in *.pub or authorized_keys files
# I.e. TYPE DATA COMMENT
ssh_key_parts = ssh_key_string.strip().split()
if len(ssh_key_parts) >= 2:
# We discard the type, which is ssh_key_parts[0]
key_data = ssh_key_parts[1]
# We discard the comment, which is ssh_key_parts[2]
# Standard fingerprint calculation
try:
decoded_key_data = base64.b64decode(key_data)
except binascii.Error as e:
raise ValueError(f"Invalid base64 encoding in key data: {e}")
from e
digest = hashlib.sha256(decoded_key_data).digest()
fingerprint_b64 =
base64.b64encode(digest).decode("utf-8").rstrip("=")
# Prefix follows the standard format
return f"SHA256:{fingerprint_b64}"
raise ValueError("Invalid SSH key format")
```
#### `atr/util.py` — `chmod_directories` (lines 258-262)
_needs modification_
Filesystem permissions helper, part of the filesystem theme.
```python
def chmod_directories(path: os.PathLike, permissions: int =
DIRECTORY_PERMISSIONS) -> None:
os.chmod(path, permissions)
for dir_path in pathlib.Path(path).rglob("*"):
if dir_path.is_dir():
os.chmod(dir_path, permissions)
```
#### `atr/util.py` — `paths_recursive` (lines 766-778)
_needs modification_
Path traversal utility already partially related to atr/paths.py; candidate
for filesystem module.
```python
async def paths_recursive(base_path: pathlib.Path | safe.StatePath) ->
AsyncGenerator[safe.RelPath]:
"""Yield all file paths recursively within a base path, relative to the
base path."""
if (resolved_base_path := await is_dir_resolve(base_path)) is None:
return
async for rel_path in paths_recursive_all(base_path):
abs_path_to_check = resolved_base_path / rel_path
with contextlib.suppress(FileNotFoundError, OSError):
if await aiofiles.os.path.isfile(abs_path_to_check):
try:
yield safe.RelPath.from_path(rel_path)
except ValueError as err:
msg = f"Unsafe relative path {str(rel_path)!r}: {err}"
raise ValueError(msg) from err
```
### Where new code would go
- `atr/filesystem.py` — new file
Home for filesystem operations: atomic_write_file, atomic_modify_file,
chmod_directories, chmod_files, create_hard_link_clone,
delete_immutable_directory, async_temporary_directory, paths_recursive,
paths_recursive_all, content_list, is_dir_resolve, is_disallowed_dotfile,
FileStat, DIRECTORY_PERMISSIONS
- `atr/email_util.py` — new file
Home for email/identity functions: email_from_uid, asf_uid_from_email,
asf_uid_from_uids, email_mid_from_thread_id, email_to_uid_map, EmailRecipients
protocol, and related constants (INCUBATOR_GENERAL_ADDRESS, USER_TESTS_ADDRESS,
DEV_TEST_MID, DEV_THREAD_URLS)
- `atr/http.py` — new file
Home for HTTP session/TLS functions: create_secure_session,
create_secure_ssl_context, get_urls_as_completed, FetchError, and TLS constants
### Proposed approach
Following the themes identified by @dave2wave and the 'largest subset first'
principle from @sbp, the recommended approach is to extract three new modules
in order of size:
1. **`atr/filesystem.py`** — the largest coherent group (~12 functions):
`atomic_write_file`, `atomic_modify_file`, `chmod_directories`, `chmod_files`,
`create_hard_link_clone`, `delete_immutable_directory`,
`async_temporary_directory`, `paths_recursive`, `paths_recursive_all`,
`content_list`, `is_dir_resolve`, `is_disallowed_dotfile`, `FileStat`,
`DIRECTORY_PERMISSIONS`, `has_files`, `number_of_release_files`,
`get_release_stats`. Note that `atr/paths.py` already handles path
construction; this new module handles file I/O and traversal.
2. **`atr/email_util.py`** (or `atr/email.py` if no conflict) — email
parsing and LDAP identity functions: `email_from_uid`, `asf_uid_from_email`,
`asf_uid_from_uids`, `email_mid_from_thread_id`, `email_to_uid_map`, the
`EmailRecipients` protocol, and mailing-list constants.
3. **`atr/http.py`** — TLS/session creation and URL fetching:
`create_secure_session`, `create_secure_ssl_context`, `get_urls_as_completed`,
`FetchError`, `LISTS_APACHE_TIMEOUT`, and TLS constants.
Each extraction should preserve backward compatibility by re-exporting from
`atr/util.py` initially (simple `from atr.filesystem import ...` at the bottom
of util.py), then gradually updating importers in follow-up PRs. The note in
util.py that 'atr.db imports this module' must be respected—no new circular
dependencies.
### Suggested patches
#### `atr/filesystem.py`
New module for the filesystem operations group—the largest extractable
subset as @sbp requested.
````diff
--- /dev/null
+++ b/atr/filesystem.py
@@ -0,0 +1,180 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import asyncio
+import contextlib
+import dataclasses
+import fcntl
+import os
+import pathlib
+import tempfile
+import uuid
+from collections.abc import AsyncGenerator, Callable
+from typing import Final
+
+import aiofiles.os
+import aiofiles.open
+import aioshutil
+
+import atr.log as log
+import atr.models.safe as safe
+
+DIRECTORY_PERMISSIONS: Final[int] = 0o755
+
+
[email protected]
+class FileStat:
+ path: str
+ modified: int
+ size: int
+ permissions: int
+ is_file: bool
+ is_dir: bool
+
+
[email protected]
+async def async_temporary_directory(
+ suffix: str | None = None, prefix: str | None = None, dir: str |
os.PathLike | None = None
+) -> AsyncGenerator[pathlib.Path]:
+ """Create an async temporary directory similar to
tempfile.TemporaryDirectory."""
+ temp_dir_path: str = await asyncio.to_thread(tempfile.mkdtemp,
suffix=suffix, prefix=prefix, dir=dir)
+ try:
+ yield pathlib.Path(temp_dir_path)
+ finally:
+ try:
+ await aioshutil.rmtree(temp_dir_path)
+ except Exception:
+ log.exception(f"Failed to remove temporary directory
{temp_dir_path}")
+
+
+async def atomic_modify_file(
+ file_path: pathlib.Path,
+ modify: Callable[[str], str],
+) -> None:
+ lock_path = file_path.with_suffix(file_path.suffix + ".lock")
+ lock_fd = await asyncio.to_thread(os.open, str(lock_path), os.O_CREAT |
os.O_RDWR)
+ try:
+ await asyncio.to_thread(fcntl.flock, lock_fd, fcntl.LOCK_EX)
+ try:
+ async with aiofiles.open(file_path, encoding="utf-8") as rf:
+ old_value = await rf.read()
+ new_value = modify(old_value)
+ if new_value != old_value:
+ await atomic_write_file(file_path, new_value)
+ finally:
+ fcntl.flock(lock_fd, fcntl.LOCK_UN)
+ finally:
+ await asyncio.to_thread(os.close, lock_fd)
+
+
+async def atomic_write_file(file_path: pathlib.Path, content: str,
encoding: str = "utf-8") -> None:
+ """Atomically write content to a file using a temporary file."""
+ await aiofiles.os.makedirs(file_path.parent, exist_ok=True)
+ temp_path = file_path.parent / f".{file_path.name}.{uuid.uuid4()}.tmp"
+ try:
+ async with aiofiles.open(temp_path, "w", encoding=encoding) as f:
+ await f.write(content)
+ await f.flush()
+ await asyncio.to_thread(os.fsync, f.fileno())
+ await aiofiles.os.rename(temp_path, file_path)
+ except Exception:
+ with contextlib.suppress(FileNotFoundError):
+ await aiofiles.os.remove(temp_path)
+ raise
+
+
+def chmod_directories(path: os.PathLike, permissions: int =
DIRECTORY_PERMISSIONS) -> None:
+ os.chmod(path, permissions)
+ for dir_path in pathlib.Path(path).rglob("*"):
+ if dir_path.is_dir():
+ os.chmod(dir_path, permissions)
+
+
+def chmod_files(path: os.PathLike, permissions: int) -> None:
+ """Set permissions on all files in a directory tree."""
+ for file_path in pathlib.Path(path).rglob("*"):
+ if file_path.is_file():
+ os.chmod(file_path, permissions)
+
+
+async def delete_immutable_directory(path: safe.StatePath, reason: str) ->
None:
+ if not reason:
+ raise ValueError("Reason is required to delete an immutable
directory")
+ log.info(f"Deleting immutable directory {path} because {reason}")
+ await asyncio.to_thread(chmod_directories, path, 0o755)
+ await aioshutil.rmtree(path)
+
+
+async def is_dir_resolve(path: os.PathLike) -> pathlib.Path | None:
+ try:
+ resolved_path = await asyncio.to_thread(pathlib.Path(path).resolve)
+ if not await aiofiles.os.path.isdir(resolved_path):
+ return None
+ except (FileNotFoundError, OSError):
+ return None
+ return resolved_path
+
+
+def is_disallowed_dotfile(segment: str) -> bool:
+ if not segment.startswith("."):
+ return False
+ if segment.startswith(".atr"):
+ return False
+ # Temporary, and only due to issues #757 and #769
+ if segment == ".gitkeep":
+ return False
+ return True
+
+
+async def paths_recursive(base_path: pathlib.Path | safe.StatePath) ->
AsyncGenerator[safe.RelPath]:
+ """Yield all file paths recursively within a base path, relative to the
base path."""
+ if (resolved_base_path := await is_dir_resolve(base_path)) is None:
+ return
+ async for rel_path in paths_recursive_all(base_path):
+ abs_path_to_check = resolved_base_path / rel_path
+ with contextlib.suppress(FileNotFoundError, OSError):
+ if await aiofiles.os.path.isfile(abs_path_to_check):
+ try:
+ yield safe.RelPath.from_path(rel_path)
+ except ValueError as err:
+ msg = f"Unsafe relative path {str(rel_path)!r}: {err}"
+ raise ValueError(msg) from err
+
+
+async def paths_recursive_all(base_path: os.PathLike) ->
AsyncGenerator[pathlib.Path]:
+ """Yield all file and directory paths recursively within a base path,
relative to the base path."""
+ if (resolved_base_path := await is_dir_resolve(base_path)) is None:
+ return
+ queue: list[pathlib.Path] = [resolved_base_path]
+ visited_abs_paths: set[pathlib.Path] = set()
+ while queue:
+ current_abs_item = queue.pop(0)
+ try:
+ resolved_current_abs_item = await
asyncio.to_thread(current_abs_item.resolve)
+ except (FileNotFoundError, OSError):
+ continue
+ if resolved_current_abs_item in visited_abs_paths:
+ continue
+ visited_abs_paths.add(resolved_current_abs_item)
+ try:
+ entries = await aiofiles.os.scandir(current_abs_item)
+ except (FileNotFoundError, PermissionError, OSError):
+ continue
+ for entry in sorted(entries, key=lambda e: e.name):
+ rel_path =
pathlib.Path(entry.path).relative_to(resolved_base_path)
+ yield rel_path
+ if entry.is_dir(follow_symlinks=False):
+ queue.append(pathlib.Path(entry.path))
````
#### `atr/util.py`
Re-export filesystem symbols from util.py for backward compatibility during
transition.
````diff
--- a/atr/util.py
+++ b/atr/util.py
@@ -47,6 +47,18 @@ import atr.models.validation as validation
import atr.paths as paths
import atr.registry as registry
import atr.user as user
+
+# Filesystem utilities have been extracted to atr.filesystem
+# Re-exported here for backward compatibility
+from atr.filesystem import ( # noqa: F401
+ DIRECTORY_PERMISSIONS,
+ FileStat,
+ async_temporary_directory,
+ atomic_modify_file,
+ atomic_write_file,
+ chmod_directories,
+ chmod_files,
+ delete_immutable_directory,
+ is_dir_resolve,
+ is_disallowed_dotfile,
+ paths_recursive,
+ paths_recursive_all,
+)
````
### Open questions
- The file is truncated so the full list of functions in util.py isn't
visible—there may be additional filesystem or email functions beyond what's
shown that should be grouped.
- Should backward-compat re-exports be permanent or removed after updating
all importers? @sbp may prefer immediate import updates.
- The `content_list`, `has_files`, `number_of_release_files`, and
`get_release_stats` functions depend on both filesystem operations and
`atr.paths`/`atr.models.sql`—should they go in `atr/filesystem.py` or stay in
`util.py` (or a release-specific module)?
- The `paths_recursive_all` function is truncated in the source—the diff
above includes a best-guess completion that must be verified against the actual
implementation.
- Naming: should the email module be `atr/email_util.py` (to avoid conflict
with stdlib `email`) or something else like `atr/mail.py`?
### Files examined
- `atr/util.py`
- `atr/models/helpers.py`
- `atr/models/safe.py`
- `atr/models/validation.py`
---
*Draft from a triage agent. A human reviewer should validate before merging
any change. The agent did not run tests or verify diffs apply.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]