This is an automated email from the ASF dual-hosted git repository.
lidongdai pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/seatunnel.git
The following commit(s) were added to refs/heads/dev by this push:
new 2df548021d [Docs] Add LLM contribution guide (#10411)
2df548021d is described below
commit 2df548021d3bbb44d49e0d86c404c80c96bb2e61
Author: corgy-w <[email protected]>
AuthorDate: Mon Feb 9 21:32:31 2026 +0800
[Docs] Add LLM contribution guide (#10411)
Co-authored-by: David Zollo <[email protected]>
---
AGENTS.md | 252 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CLAUDE.md | 1 +
GEMINI.md | 1 +
3 files changed, 254 insertions(+)
diff --git a/AGENTS.md b/AGENTS.md
new file mode 100644
index 0000000000..95e6ae00d0
--- /dev/null
+++ b/AGENTS.md
@@ -0,0 +1,252 @@
+# LLM Context Guide for Apache SeaTunnel
+
+This guide helps AI assistants (LLMs / Agents) make **safe, consistent, and
verifiable** changes to the Apache SeaTunnel codebase. It mirrors practices
from mature Apache projects and adapts them to SeaTunnel’s **build, testing,
architecture, and documentation conventions**.
+
+## ⚠️ CRITICAL: Validate Before Proposing Changes
+
+**Agents MUST run verification commands locally before suggesting or
finalizing changes.**
+
+```bash
+# Format code (mandatory)
+./mvnw spotless:apply
+
+# Quick verification (mandatory)
+./mvnw -q -DskipTests verify
+
+# Unit tests (strongly recommended)
+./mvnw test
+```
+
+Failure to meet these requirements will likely result in PR rejection.
+
+## Git Commit Message Convention
+
+SeaTunnel follows a **strict commit message format** to maintain a clean and
searchable history.
+
+**Format**:
+
+```
+[Type][Module] Description
+```
+
+### Types
+
+* `Feature` – New features
+* `Fix` – Bug fixes
+* `Improve` – Improvements to existing behavior
+* `Docs` – Documentation-only changes
+* `Test` – Test cases or test framework changes
+* `Chore` – Build, dependency, or maintenance tasks
+
+### Modules
+
+* `Connector-V2` – seatunnel-connectors-v2
+* `Zeta` – seatunnel-engine (Zeta engine)
+* `Core` – seatunnel-core
+* `API` – seatunnel-api
+* `Transform-V2` – seatunnel-transforms-v2
+* `Format` – seatunnel-formats
+* `Translation` – seatunnel-translation
+* `E2E` – seatunnel-e2e
+
+### Examples
+
+* `[Fix][Connector-V2] Fix MySQL source split enumeration bug`
+* `[Fix][Zeta] Fix checkpoint timeout under heavy backpressure`
+* `[Feature][Transform-V2] Add LLM transform plugin`
+* `[Improve][Core] Optimize jar package loading speed`
+* `[Docs] Update quick start guide`
+
+## Repository Structure
+
+```text
+seatunnel/
+├── seatunnel-api/ # Core API definitions
+├── seatunnel-connectors-v2/ # Source & Sink connectors (main contribution
area)
+├── seatunnel-transforms-v2/ # Transform plugins (including LLM)
+├── seatunnel-engine/ # Zeta engine & Web UI
+├── seatunnel-core/ # Job submission & CLI entry points
+├── seatunnel-translation/ # Flink & Spark adapters
+├── seatunnel-formats/ # Data formats (JSON, Avro, etc.)
+├── seatunnel-e2e/ # End-to-End integration tests
+├── docs/ # Documentation (en & zh)
+└── config/ # Default configurations
+```
+
+## Code Standards
+
+### Java Backend
+
+* **Formatting**: Google Java Format (AOSP style), enforced by Spotless
+* **Imports**:
+ * No wildcard imports
+ * Use shaded dependencies: `org.apache.seatunnel.shade.*`
+* **Nullability**: Avoid implicit null assumptions
+* **Visibility**: Keep APIs minimal; prefer package-private when possible
+* **Comments**: Add comments for important methods (public APIs, complex
logic). Important methods include public APIs, lifecycle hooks (initialization,
start/stop, checkpoint), and complex or performance-critical logic. Example:
+
+```java
+/**
+ * Enumerates source splits for parallel reading.
+ * Called once during job initialization.
+ *
+ * @param context Split enumeration context
+ * @return Collection of discovered splits
+ */
+@Override
+public List<SourceSplit> enumerateSplits(SplitEnumerationContext context) {
+ // Implementation
+}
+```
+
+### Apache License Header (MANDATORY)
+
+All **new files** MUST include the ASF license header:
+
+```java
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+```
+
+## 🚨 Backward Compatibility (VERY IMPORTANT)
+
+Agents MUST treat backward compatibility as a **hard constraint**.
+
+* DO NOT remove or rename existing config options
+* DO NOT change default values casually
+* DO NOT break public APIs or SPI contracts
+
+Any incompatible change MUST:
+
+* Be explicitly documented
+* Be documented in `docs/en/introduction/concepts/incompatible-changes.md`
+* Include migration guidance
+* Be clearly explained in the PR description
+
+## Dependency Rules
+
+* DO NOT introduce new dependencies unless absolutely necessary
+* Prefer existing shaded dependencies under `org.apache.seatunnel.shade.*`
+* Any new dependency MUST:
+ * Be justified in the PR description
+ * Consider shading, size, and conflict risks
+
+## Architecture Guidelines
+
+### Connector (V2)
+
+* Implement `SeaTunnelSource` or `SeaTunnelSink`
+* Define configs using `Option`
+* Support parallelism via `SourceSplitEnumerator`
+* Avoid connector-specific logic leaking into engine or core
+
+### Zeta Engine
+
+* **Client**: Submits job config
+* **Master**: Schedules & coordinates
+* **Worker**: Executes tasks (Source → Transform → Sink)
+
+Respect task boundaries and lifecycle semantics.
+
+## Configuration (Option) Rules
+
+* All user-facing configs MUST be defined using `Option`
+* Each option MUST include:
+ * name
+ * type
+ * default value (if applicable)
+ * clear description
+* Option names are **stable contracts** and must not be renamed lightly
+
+## Error Handling & Logging
+
+* Exceptions MUST include sufficient context (table, task, config key)
+* Avoid swallowing exceptions
+* Use proper log levels:
+ * INFO – lifecycle events
+ * WARN – recoverable issues
+ * ERROR – task-failing errors
+* NEVER log sensitive information (passwords, tokens, credentials)
+
+## Documentation Rules
+
+* Any user-visible change MUST update:
+
+ * `docs/en`
+ * `docs/zh`
+* Config names, defaults, and examples MUST match the code exactly
+* Documentation is part of the feature, not an afterthought
+
+## Testing Guidelines
+
+### Unit Tests
+
+* Located under `src/test/java`
+* Validate behavior, not implementation details
+* Prefer deterministic and minimal tests
+
+Command:
+
+```bash
+./mvnw test
+```
+
+### E2E Tests
+
+* Located in `seatunnel-e2e`
+* Uses Testcontainers
+* Extend `TestSuiteBase`
+
+Command:
+
+```bash
+./mvnw -DskipUT -DskipIT=false verify
+```
+
+## Performance Awareness
+
+Agents MUST consider performance implications:
+
+* Avoid unnecessary object creation in hot paths
+* Be cautious with large in-memory buffers
+* Consider parallelism and resource usage
+
+## PR Scope Rule
+
+* Keep changes minimal and focused
+* Avoid unrelated refactors or formatting-only changes
+* One PR should solve **one problem**
+
+## Running & Debugging
+
+### Build from Source
+
+```bash
+./mvnw clean install -DskipTests -Dskip.spotless=true
+```
+
+### Install Connectors
+
+```bash
+sh bin/install-plugin.sh $current_version
+```
+
+### Run Job (Zeta)
+
+```bash
+sh bin/seatunnel.sh --config config/v2.batch.config.template -e local
+```
diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 120000
index 0000000000..47dc3e3d86
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1 @@
+AGENTS.md
\ No newline at end of file
diff --git a/GEMINI.md b/GEMINI.md
new file mode 120000
index 0000000000..47dc3e3d86
--- /dev/null
+++ b/GEMINI.md
@@ -0,0 +1 @@
+AGENTS.md
\ No newline at end of file