DanielCarter-stack commented on PR #10598:
URL: https://github.com/apache/seatunnel/pull/10598#issuecomment-4055924818

   <!-- code-pr-reviewer -->
   <!-- cpr:pr_reply_v2_parts {"group": "apache/seatunnel#10598", "part": 1, 
"total": 1} -->
   ### Issue 1: Missing directory existence check and friendly error messages
   
   **Location**: `tools/documents/update_connector_change_log.py:105-106`
   ```python
   def write_commit(connector, prs, changelog_dir, commit_version_map, 
in_release):
       with open(changelog_dir + '/' + connector + '.md', 'w') as file:
   ```
   
   **Related Context**:
   - Caller: `main` function lines 101-102
   - Dependency: `changelog_dir` variable (lines 98-99)
   
   **Problem Description**:
   The `write_commit` function directly attempts to open a file for writing 
without checking in advance whether the `changelog_dir` directory exists. If 
the directory does not exist (as was the case before this PR's fix), it will 
throw `FileNotFoundError`, but the error message only shows the file path 
without clearly indicating it's a directory structure issue, which may confuse 
users.
   
   **Potential Risks**:
   - Risk 1: If the documentation directory is adjusted again in the future, 
users will see cryptic error messages
   - Risk 2: When the script fails in CI/CD environments, it's difficult to 
quickly identify the root cause
   - Risk 3: Contributors unfamiliar with the project structure may not know 
where the correct directory is
   
   **Impact Scope**:
   - Direct impact: Callers of the `write_commit` function
   - Indirect impact: All workflows that rely on this script to generate 
changelogs
   - Scope: Utility script (single module)
   
   **Severity**: MINOR
   
   **Improvement Suggestions**:
   ```python
   def write_commit(connector, prs, changelog_dir, commit_version_map, 
in_release):
       # Check if directory exists
       if not os.path.exists(changelog_dir):
           raise FileNotFoundError(
               f"Changelog directory does not exist: {changelog_dir}\n"
               f"Please ensure the documentation structure is correct. "
               f"Expected path: docs/en/connectors/changelog/ or 
docs/zh/connectors/changelog/"
           )
       
       # Check if directory is writable
       if not os.access(changelog_dir, os.W_OK):
           raise PermissionError(
               f"No write permission for directory: {changelog_dir}"
           )
       
       file_path = os.path.join(changelog_dir, connector + '.md')
       with open(file_path, 'w') as file:
           # ... Original logic
   ```
   
   **Rationale**: Adding a preliminary check can provide clearer error messages 
when errors occur, helping users quickly identify problems. Additionally, 
checking write permissions avoids failing after writing a large amount of data 
due to permission issues.
   
   ---
   
   ### Issue 2: Hard-coded paths result in high maintenance costs
   
   **Location**: `tools/documents/update_connector_change_log.py:98-99`
   ```python
   changelog_dir = os.path.join(directory, 'docs', 'en', 'connectors', 
'changelog')
   zh_changelog_dir = os.path.join(directory, 'docs', 'zh', 'connectors', 
'changelog')
   ```
   
   **Related Context**:
   - Usage location: Passed to `write_commit` at lines 101-102
   - Historical context: Changed from `connector-v2` to `connectors` (this PR)
   
   **Problem Description**:
   The path string is hard-coded in the code. Every time the documentation 
structure is adjusted, the code needs to be modified. This design lacks 
flexibility and increases maintenance costs. If the directory structure is 
adjusted again in the future, or if different documentation layouts need to be 
supported, the code will need to be modified again.
   
   **Potential Risks**:
   - Risk 1: During future directory structure adjustments, it's easy to miss 
modifying this script (as this PR demonstrates)
   - Risk 2: Cannot support custom documentation directories (e.g., users want 
to place documentation in different locations)
   - Risk 3: When paths are repeatedly defined across multiple scripts, 
inconsistencies can easily occur
   
   **Impact Scope**:
   - Direct impact: `main` function
   - Indirect impact: Script maintainability and flexibility
   - Scope: Utility script
   
   **Severity**: MINOR
   
   **Improvement Suggestions**:
   ```python
   # Option 1: Use constant definitions at the top of the file
   DEFAULT_CHANGELOG_DIR = 'docs/en/connectors/changelog'
   DEFAULT_ZH_CHANGELOG_DIR = 'docs/zh/connectors/changelog'
   
   def main():
       # ...
       changelog_dir = os.path.join(directory, DEFAULT_CHANGELOG_DIR)
       zh_changelog_dir = os.path.join(directory, DEFAULT_ZH_CHANGELOG_DIR)
       # ...
   
   # Option 2: Support command line argument override
   import argparse
   
   def main():
       parser = argparse.ArgumentParser(description='Update connector 
changelog')
       parser.add_argument('--changelog-dir', 
default='docs/en/connectors/changelog',
                           help='Path to changelog directory')
       parser.add_argument('--zh-changelog-dir', 
default='docs/zh/connectors/changelog',
                           help='Path to Chinese changelog directory')
       args = parser.parse_args()
       
       changelog_dir = os.path.join(directory, args.changelog_dir)
       zh_changelog_dir = os.path.join(directory, args.zh_changelog_dir)
       # ...
   
   # Option 3: Automatically detect directory location
   def find_changelog_directory(directory, language='en'):
       """自动查找 changelog 目录"""
       possible_paths = [
           f'docs/{language}/connectors/changelog',  # 新结构
           f'docs/{language}/connector-v2/changelog', # 旧结构(向后兼容)
       ]
       for path in possible_paths:
           full_path = os.path.join(directory, path)
           if os.path.exists(full_path):
               return full_path
       raise FileNotFoundError(f"Cannot find changelog directory for 
{language}")
   ```
   
   **Rationale**:
   - Approach 1 is the simplest, managing paths centrally through constants for 
easy modification
   - Approach 2 provides flexibility, allowing users to adjust paths without 
modifying code
   - Approach 3 is the most robust, automatically detecting directory locations 
and providing backward compatibility
   
   ---
   
   ### Issue 3: Missing script documentation and usage instructions
   
   **Location**: `tools/documents/update_connector_change_log.py:1-23`
   ```python
   #  Licensed to the Apache Software Foundation (ASF) under one or more
   #  contributor license agreements.  See the NOTICE file distributed with
   #  ...
   #  limitations under the License.
   
   
   import os
   import subprocess
   import html
   from packaging.version import Version
   from pathlib import Path
   ```
   
   **Related Context**:
   - File: `tools/documents/update_connector_change_log.py`
   - Directory: `tools/documents/`
   
   **Problem Description**:
   The script file lacks a docstring and usage instructions, containing only an 
Apache License header. New users or maintainers may be unclear about:
   - What is the purpose of this script?
   - How to run the script?
   - What prerequisites does the script require (e.g., git repository, specific 
directory structure)?
   - What is the script's output?
   - When should this script be run?
   
   **Potential Risks**:
   - Risk 1: New contributors may not be aware of this tool's existence and 
manually update changelogs
   - Risk 2: Incorrect usage of the script (e.g., running in the wrong 
directory) leads to unexpected results
   - Risk 3: Maintainers find it difficult to understand the script's expected 
behavior, increasing maintenance difficulty
   
   **Impact Scope**:
   - Direct impact: Script discoverability and usability
   - Indirect impact: Changelog update efficiency and quality
   - Scope: Documentation tool
   
   **Severity**: MINOR
   
   **Improvement Suggestions**:
   ```python
   #  Licensed to the Apache Software Foundation (ASF) under one or more
   #  contributor license agreements.  See the NOTICE file distributed with
   #  ...
   #  limitations under the License.
   
   """
   Update Connector Changelog Generator
   
   This script automatically generates changelog markdown files for SeaTunnel 
connectors
   by scanning git commit history and mapping commits to version tags.
   
   Usage:
       python3 tools/documents/update_connector_change_log.py
   
   Requirements:
       - Must be run from the SeaTunnel repository root
       - Git repository must have fetch access to 
https://github.com/apache/seatunnel.git
       - Documentation directory structure: docs/en/connectors/changelog/
   
   Output:
       - Generates/updates markdown files in docs/en/connectors/changelog/
       - Generates/updates markdown files in docs/zh/connectors/changelog/
       - Each connector gets a file named: connector-{name}.md
   
   Behavior:
       - Scans seatunnel-connectors-v2/ directory for connector changes
       - Fetches git tags to map commits to versions
       - Filters out maven-release-plugin commits
       - Marks unreleased commits as 'dev' version
       - Writes changelog in Markdown table format
   
   Example:
       After adding new features to connector-jdbc, run this script to update
       docs/en/connectors/changelog/connector-jdbc.md with the latest commits.
   
   Author: Apache SeaTunnel Community
   """
   
   import os
   import subprocess
   import html
   from packaging.version import Version
   from pathlib import Path
   ```
   
   **Rationale**: Adding a detailed docstring can:
   1. Help users quickly understand the script's purpose
   2. Provide clear usage guidance
   3. Explain the script's prerequisites and expected behavior
   4. Lower the barrier to maintenance and onboarding
   
   ---
   
   ### Issue 4: Missing logging and progress feedback
   
   **Location**: `tools/documents/update_connector_change_log.py:93-119`
   ```python
   def main():
       changes = generate_log_info()
       commit_version_map = get_tag_commit_list()
       in_release = get_current_branch_name().endswith('-release')
       directory = 
os.path.dirname(os.path.abspath(Path(__file__).parent.parent))
       changelog_dir = os.path.join(directory, 'docs', 'en', 'connectors', 
'changelog')
       zh_changelog_dir = os.path.join(directory, 'docs', 'zh', 'connectors', 
'changelog')
       for connector, prs in changes.items():
           write_commit(connector, prs, changelog_dir, commit_version_map, 
in_release)
           write_commit(connector, prs, zh_changelog_dir, commit_version_map, 
in_release)
   ```
   
   **Related Context**:
   - Functions: `main`, `generate_log_info`, `write_commit`
   - Output: 95 changelog files
   
   **Problem Description**:
   The script does not output any logs or progress information during 
execution, so users cannot know:
   - Is the script running?
   - How many connectors have been scanned?
   - Which connector is being processed?
   - Did it complete successfully?
   - If it failed, at which step did it fail?
   
   **Potential Risks**:
   - Risk 1: The script appears to "freeze", and users may interrupt execution
   - Risk 2: Difficult to track execution progress in CI/CD environments
   - Risk 3: The script fails silently, and users don't know whether it 
succeeded
   
   **Impact Scope**:
   - Direct impact: User experience and debugging experience
   - Indirect impact: CI/CD observability
   - Scope: Utility script
   
   **Severity**: MINOR
   
   **Improvement Suggestions**:
   ```python
   import logging
   
   # Configure logging
   logging.basicConfig(
       level=logging.INFO,
       format='%(asctime)s - %(levelname)s - %(message)s'
   )
   logger = logging.getLogger(__name__)
   
   def main():
       logger.info("Starting connector changelog update...")
       
       logger.info("Scanning connector changes...")
       changes = generate_log_info()
       logger.info(f"Found {len(changes)} connectors with changes")
       
       logger.info("Fetching git tags and version mapping...")
       commit_version_map = get_tag_commit_list()
       logger.info(f"Mapped {len(commit_version_map)} commits to versions")
       
       in_release = get_current_branch_name().endswith('-release')
       directory = 
os.path.dirname(os.path.abspath(Path(__file__).parent.parent))
       changelog_dir = os.path.join(directory, 'docs', 'en', 'connectors', 
'changelog')
       zh_changelog_dir = os.path.join(directory, 'docs', 'zh', 'connectors', 
'changelog')
       
       logger.info(f"Writing changelogs to: {changelog_dir}")
       
       success_count = 0
       failure_count = 0
       for i, (connector, prs) in enumerate(changes.items(), 1):
           try:
               logger.info(f"[{i}/{len(changes)}] Processing {connector} 
({len(prs)} commits)")
               write_commit(connector, prs, changelog_dir, commit_version_map, 
in_release)
               write_commit(connector, prs, zh_changelog_dir, 
commit_version_map, in_release)
               success_count += 1
           except Exception as e:
               logger.error(f"Failed to process {connector}: {e}")
               failure_count += 1
       
       logger.info(f"Completed: {success_count} succeeded, {failure_count} 
failed")
       if failure_count > 0:
           logger.warning(f"Some connectors failed to update. Please check the 
errors above.")
   ```
   
   **Rationale**: Adding logging can:
   1. Let users understand the script's execution progress
   2. Provide sufficient context information when failures occur
   3. Facilitate tracking execution in CI/CD environments
   4. Help quickly identify problems
   
   ---


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to