DanielCarter-stack commented on PR #10411:
URL: https://github.com/apache/seatunnel/pull/10411#issuecomment-3815813416

   <!-- code-pr-reviewer -->
   <!-- cpr:pr_reply_v2_parts {"group": "apache/seatunnel#10411", "part": 1, 
"total": 1} -->
   ### Issue 1: Missing Chinese version of documentation `AGENTS_ZH.md`
   
   **Location**: Root directory
   ```bash
   # Files promised but not created in PR
   AGENTS_ZH.md
   ```
   
   **Related Context**:
   - PR description explicitly states: "This PR introduces `AGENTS.md` (and its 
Chinese version `AGENTS_ZH.md`)"
   - Project documentation standards require: "Any user-visible change MUST 
update: `docs/en` AND `docs/zh`"
   - Existing Chinese documentation directory: `docs/zh/` exists with complete 
structure
   
   **Issue Description**:
   The PR description explicitly promises to create a Chinese version 
`AGENTS_ZH.md`, but the actual changes only create an English version 
`AGENTS.md` and three symlinks. This violates:
   1. The explicit promise in the PR description
   2. The project's own bilingual documentation standards (also mentioned in 
AGENTS.md lines 170-177)
   
   **Potential Risks**:
   - Risk 1: Chinese contributors using AI assistants may not receive optimal 
guidance
   - Risk 2: Violates the project's bilingual documentation convention, causing 
inconsistent documentation structure
   - Risk 3: May set a wrong example for other contributors to "ignore Chinese 
documentation"
   
   **Impact Scope**:
   - Direct impact: Chinese contributors and developers using Chinese AI models
   - Indirect impact: Project internationalization strategy and documentation 
consistency
   - Impact level: Project-level (documentation standards)
   
   **Severity**: MAJOR
   
   **Improvement Suggestions**:
   ```bash
   # Option 1: Create AGENTS_ZH.md (recommended)
   # Need to translate AGENTS.md to Chinese, keeping structure consistent
   
   # Option 2: Create symlink to English version (temporary solution)
   ln -s AGENTS.md AGENTS_ZH.md
   # But this is not a good long-term solution
   
   # Option 3: Defer to follow-up PR
   # Clearly state in PR description that Chinese version will be added in 
follow-up PR
   # Need to create Issue to track
   ```
   
   **Rationale**: The project has clear bilingual documentation standards, and 
the PR description promised a Chinese version. Suggest supplementing before 
merge, or explicitly stating the plan for the Chinese version in the PR 
description.
   
   ---
   
   ### Issue 2: Startup script path error
   
   **Location**: `AGENTS.md:233-237`
   ```markdown
   ### Run Job (Zeta)
   ```bash
   sh bin/seatunnel.sh --config config/v2.batch.config.template -e local
   ```
   ```
   
   ** **Related Context:
   - 实际 seatunnel.sh 
位置:`seatunnel-core/seatunnel-starter/src/main/bin/seatunnel.sh`
   - config 模板位置:`config/v2.batch.config.template` ✓ 存在
   - 文档声称的路径:`bin/seatunnel.sh` ✗ 不存在
   
   ** **Problem Description:
   文档中提供的运行命令使用了错误的路径。`bin/` 目录下只有 `install-plugin.sh`,没有 `seatunnel.sh`。实际的 
`seatunnel.sh` 位于 `seatunnel-core/seatunnel-starter/src/main/bin/`。
   
   这会导致:
   1. 新手直接复制命令会失败
   2. AI 助手可能生成错误的运行脚本
   3. 降低文档的可信度
   
   ** **Potential Risks:
   - 风险 1:开发者尝试运行命令时会报错 "No such file or directory"
   - 风险 2:浪费开发者时间排查路径问题
   - 风险 3:AI 助手可能学习到错误的路径模式
   
   ** **Impact Scope:
   - 直接影响:尝试运行 SeaTunnel 的开发者
   - 间接影响:文档可信度
   - 影响面:使用 Zeta 引擎的用户
   
   ** **Severity:** MAJOR
   
   ** **Improvement Suggestions:
   ```markdown
   ### Run Job (Zeta)
   
   **After building from source**, you can run a job using:
   
   ```bash
   
   # ## Issue 3: Incomplete directory structure description
   
   ** **Location:** `AGENTS.md:60-74`
   ```markdown
   ## Repository Structure
   
   ```text
   seatunnel/
   ├── seatunnel-api/              # Core API definitions
   ├── seatunnel-connectors-v2/    # Source & Sink connectors (main 
contribution area)
   ├── seatunnel-transforms-v2/    # Transform plugins (including LLM)
   ├── seatunnel-engine/           # Zeta engine & Web UI
   ├── seatunnel-core/             # Job submission & CLI entry points
   ├── seatunnel-translation/      # Flink & Spark adapters
   ├── seatunnel-formats/          # Data formats (JSON, Avro, etc.)
   ├── seatunnel-e2e/              # End-to-End integration tests
   ├── docs/                       # Documentation (en & zh)
   └── config/                     # Default configurations
   ```
   ```
   
   ** **Related Context:
   - 实际存在的模块(通过 `ls -d seatunnel-*` 确认):
     - `seatunnel-api` ✓
     - `seatunnel-connectors-v2` ✓
     - `seatunnel-transforms-v2` ✓
     - `seatunnel-engine` ✓
     - `seatunnel-core` ✓
     - `seatunnel-translation` ✓
     - `seatunnel-formats` ✓
     - `seatunnel-e2e` ✓
     - `seatunnel-common` ✗(未列出)
     - `seatunnel-config` ✗(未列出)
     - `seatunnel-dist` ✗(未列出)
     - `seatunnel-examples` ✗(未列出)
     - `seatunnel-plugin-discovery` ✗(未列出)
     - `seatunnel-shade` ✗(未列出)
     - `seatunnel-ci-tools` ✗(未列出)
   
   ** **Problem Description:
   文档中列出的目录结构不完整,遗漏了 7 个子模块。虽然文档说明是"Overview of key modules",但:
   1. 没有明确说明这是"非完整列表"
   2. `seatunnel-common` 是一个核心模块(公共工具类),被遗漏不太合理
   3. `seatunnel-dist` 与打包分发相关,也可能被贡献者接触到
   
   ** **Potential Risks:
   - 风险 1:AI 助手可能认为只有这 9 个模块,在实际修改时遗漏其他模块
   - 风险 2:贡献者在需要修改 `seatunnel-common` 时,可能不知道这个模块的存在
   - 风险 3:可能误导 AI 助手进行模块推荐
   
   ** **Impact Scope:
   - 直接影响:需要修改被遗漏模块的贡献者
   - 间接影响:AI 助手的模块推荐准确性
   - 影响面:多模块
   
   ** **Severity:** MINOR
   
   ** **Improvement Suggestions:
   ```markdown
   ## Repository Structure
   
   ```text
   seatunnel/
   ├── seatunnel-api/              # Core API definitions
   ├── seatunnel-common/           # Common utilities and shared code
   ├── seatunnel-config/           # Configuration module
   ├── seatunnel-connectors-v2/    # Source & Sink connectors (main 
contribution area)
   ├── seatunnel-transforms-v2/    # Transform plugins (including LLM)
   ├── seatunnel-engine/           # Zeta engine & Web UI
   ├── seatunnel-core/             # Job submission & CLI entry points
   ├── seatunnel-translation/      # Flink & Spark adapters
   ├── seatunnel-formats/          # Data formats (JSON, Avro, etc.)
   ├── seatunnel-e2e/              # End-to-End integration tests
   ├── seatunnel-dist/             # Distribution packaging
   ├── seatunnel-shade/            # Shaded dependencies
   ├── seatunnel-plugin-discovery/ # Plugin discovery mechanism
   ├── seatunnel-ci-tools/         # CI/CD tools
   ├── seatunnel-examples/         # Example configurations
   ├── docs/                       # Documentation (en & zh)
   └── config/                     # Default configurations
   ```
   
   **Key modules for contributors**:
   - **Connector-V2**: Most contributions happen here
   - **Transform-V2**: Data transformation plugins
   - **Zeta (seatunnel-engine)**: Engine development
   - **E2E**: Integration tests
   ```
   
   ** **Reasoning:
   1. 列出所有模块,提供完整的视图
   2. 添加简短描述说明每个模块的用途
   3. 强调"Key modules for contributors",帮助贡献者聚焦
   4. 保持简洁,但提供完整信息
   
   ---
   
   # ## Issue 4: Git commit format differs from actual history
   
   ** **Location:** `AGENTS.md:22-58`
   
   ** **Related Context:
   - 文档声称的格式:`[Type][Module] Description`
   - 实际历史中的变体:
     - `[Docs] Add LLM contribution guide`(无模块)✓ 符合
     - `[Fix][Connector-V2][Hive] ...`(三级标签)⚠️ 变体
     - `[Fix] [connector-hbase] ...`(小写+空格)⚠️ 变体
     - `[Feature] [Connector-file] ...`(空格)⚠️ 变体
     - `[Bug][Connector-V2] ...`(使用 Bug)⚠️ 类型不一致
   
   ** **Problem Description:
   文档中定义的提交格式是"理想格式",但项目实际历史中存在多种变体:
   1. 模块标签有时使用小写(如 `connector-hbase` vs `Connector-V2`)
   2. 有时使用三级标签(如 `[Fix][Connector-V2][Hive]`)
   3. 有时标签之间有空格(`[Fix] [connector-hbase]`)
   4. 类型有时使用 `Bug` 而非 `Fix`
   5. 有些提交没有模块标签(如 `[Docs] ...`)
   
   文档没有说明这些变体是否可接受,可能导致 AI 助手或贡献者困惑。
   
   ** **Potential Risks:
   - 风险 1:AI 助手可能拒绝接受历史中的合法提交格式
   - 风险 2:贡献者可能认为历史提交格式"错误"
   - 风险 3:缺乏明确的规范说明,导致新贡献者不知道该遵循哪个格式
   
   ** **Impact Scope:
   - 直接影响:Git 提交信息格式
   - 间接影响:提交历史的一致性
   - 影响面:项目级
   
   ** **Severity:** MINOR
   
   ** **Improvement Suggestions:
   ```markdown
   ## Git Commit Message Convention
   
   SeaTunnel follows a **standardized commit message format** to maintain a 
clean and searchable history.
   
   **Recommended Format**:
   
   ```
   [Type][Module] Description
   ```
   
   **Allowed Variations**:
   - `[Type] Description` (without module tag, for general changes)
   - `[Type][Module][Sub-module] Description` (for specific sub-components)
   
   **Type Values** (case-sensitive, prefer these):
   * `Feature`  – New features
   * `Fix`      – Bug fixes (also `Bug` is accepted but discouraged)
   * `Improve`  – Improvements to existing behavior
   * `Docs`     – Documentation-only changes
   * `Test`     – Test cases or test framework changes
   * `Chore`    – Build, dependency, or maintenance tasks
   
   **Module Values** (use consistent casing):
   * `Connector-V2`  – seatunnel-connectors-v2
   * `Zeta`          – seatunnel-engine (Zeta engine)
   * `Core`          – seatunnel-core
   * `API`           – seatunnel-api
   * `Transform-V2`  – seatunnel-transforms-v2
   * `Format`        – seatunnel-formats
   * `Translation`   – seatunnel-translation
   * `E2E`           – seatunnel-e2e
   
   **Examples**:
   * `[Fix][Connector-V2] Fix MySQL source split enumeration bug`
   * `[Feature][Transform-V2] Add LLM transform plugin`
   * `[Docs] Update quick start guide`
   * `[Improve][Core] Optimize jar package loading speed`
   
   **Historical Note**: Some historical commits use variations like lowercase 
module names (`connector-hbase`), extra spaces (`[Fix] [module]`), or `Bug` 
instead of `Fix`. New commits should follow the recommended format above.
   ```
   
   ** **Reasoning:
   1. 明确区分"推荐格式"和"允许变体"
   2. 说明历史差异,避免混淆
   3. 提供清晰的指导,告诉新贡献者应该怎么做
   4. 保持向后兼容性,承认历史格式的合理性
   
   ---
   
   # ## Issue 5: Documentation location may not be prominent enough
   
   ** **Location:** Root directory `AGENTS.md`
   
   ** **Related Context:
   - 文件放置在项目根目录
   - 与 `README.md` 同级
   - 没有在 `README.md` 中引用或提及
   - 没有在 `docs/` 目录中建立索引
   
   ** **Problem Description:
   虽然 `AGENTS.md` 放在根目录有助于 LLM 工具发现,但对于人类贡献者来说:
   1. 没有在 README 中提及,可能不容易被发现
   2. 不在 docs/ 目录下,可能不被视为"正式文档"
   3. 没有在文档导航中集成
   
   ** **Potential Risks:
   - 风险 1:人类贡献者可能不知道这个文档的存在
   - 风险 2:文档可能长期不更新,因为没人知道要更新它
   - 风险 3:与 `docs/` 目录的隔离可能导致内容重复或冲突
   
   ** **Impact Scope:
   - 直接影响:文档的可发现性
   - 间接影响:文档的维护和更新
   - 影响面:项目级
   
   ** **Severity:** MINOR
   
   ** **Improvement Suggestions:
   
   ** **Option 1: Add reference in README.md (recommended)
   ```markdown
   # Add to the "Contributing" section in README.md
   
   ## Contributing
   
   We welcome contributions! Please check out:
   - [Contribution Guide](docs/en/contribution/...) (for humans)
   - [LLM Context Guide](AGENTS.md) (for AI assistants like Claude, GPT, etc.)
   ```
   
   ** **Option 2: Add link in docs/ as well
   ```bash
   # Create docs/en/AGENTS.md (symlink or redirect)
   ln -s ../../AGENTS.md docs/en/AGENTS.md
   # Create docs/zh/AGENTS_ZH.md
   ln -s ../../AGENTS_ZH.md docs/zh/AGENTS_ZH.md
   ```
   
   ** **Option 3: Add to documentation sidebar
   在 `docs/sidebars.js` 中添加:
   ```javascript
   {
     type: 'link',
     label: 'For AI Assistants',
     href: '/en/AGENTS',
   }
   ```
   
   ** **Reasoning:
   1. 提高文档的可发现性
   2. 让人类贡献者也知道这个资源
   3. 建立与正式文档体系的连接
   4. 便于未来维护和更新
   
   ---
   
   # ## Issue 6: Missing CI/CD workflow documentation
   
   ** **Location:** `AGENTS.md` (overall)
   
   ** **Related Context:
   - 文档提到了本地验证命令(`./mvnw spotless:apply`, `./mvnw verify`, `./mvnw test`)
   - 但没有提及 CI/CD 流程
   - 项目有 `.github/workflows/` 目录(GitHub Actions)
   
   ** **Problem Description:
   现代开源项目通常有 CI/CD 流程自动检查 PR。文档中:
   1. 没有说明 PR 提交后 CI 会自动运行哪些检查
   2. 没有说明如何查看 CI 结果
   3. 没有说明 CI 失败后如何处理
   4. 没有说明如何在本地模拟 CI 检查
   
   ** **Potential Risks:
   - 风险 1:贡献者可能不知道 CI 会自动检查
   - 风险 2:贡献者可能不知道如何理解 CI 错误
   - 风险 3:AI 助手可能生成无法通过 CI 的代码
   
   ** **Impact Scope:
   - 直接影响:PR 提交流程
   - 间接影响:PR 合并效率
   - 影响面:所有贡献者
   
   ** **Severity:** MINOR
   
   ** **Improvement Suggestions:
   ```markdown
   ## CI/CD Process
   
   **Automated Checks**:
   When you open a PR, GitHub Actions will automatically run:
   - Spotless format check
   - Compilation (verify)
   - Unit tests
   - E2E tests (for some modules)
   
   **Viewing CI Results**:
   1. Go to your PR page
   2. Scroll to the "Checks" section at the bottom
   3. Click on the failed job to see detailed logs
   
   **Common CI Failures**:
   - **Spotless check failed**: Run `./mvnw spotless:apply` and push again
   - **Test failed**: Check the test logs, fix the issue, and push
   - **Compilation failed**: Check the compiler error logs
   
   **Running CI Locally** (optional):
   ```bash
   
   # ## Issue 7: E2E test command may be inaccurate
   
   ** **Location:** `AGENTS.md:193-203`
   
   ** **Related Context:
   - 文档声称:
     ```bash
     ./mvnw -DskipUT -DskipIT=false verify
     ```
   - 这个命令会跳过单元测试(skipUT),但运行集成测试
   
   ** **Problem Description:
   1. 命令名称叫 "E2E Tests",但实际使用 `-DskipIT=false`,这是运行集成测试,不一定是 E2E 测试
   2. E2E 测试可能位于 `seatunnel-e2e` 模块,需要特定的命令运行
   3. 文档没有说明如何运行特定模块的 E2E 测试
   
   ** **Potential Risks:
   - 风险 1:AI 助手可能无法正确区分 E2E 测试和集成测试
   - 风险 2:贡献者可能无法运行特定的 E2E 测试
   - 风险 3:测试运行失败时不知道如何排查
   
   ** **Impact Scope:
   - 直接影响:测试运行
   - 间接影响:PR 质量保证
   - 影响面:Connector-V2 和其他需要 E2E 测试的模块
   
   ** **Severity:** MINOR
   
   ** **Improvement Suggestions:
   ```markdown
   ### E2E Tests
   
   * Located in `seatunnel-e2e`
   * Uses Testcontainers for real environment testing
   * Extend `TestSuiteBase` for consistent test setup
   
   **Run all E2E tests**:
   ```bash
   ./mvnw -DskipUT -DskipIT=false verify
   ```
   
   **Run specific E2E test**:
   ```bash


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to