This is an automated email from the ASF dual-hosted git repository.
chaokunyang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/fory.git
The following commit(s) were added to refs/heads/main by this push:
new 512c537fc docs: add AGENT.md to make AI coding more efficient (#2646)
512c537fc is described below
commit 512c537fc093654502e2ce2e0b572a0c6c904b70
Author: Shawn Yang <[email protected]>
AuthorDate: Tue Sep 23 17:50:18 2025 +0800
docs: add AGENT.md to make AI coding more efficient (#2646)
## Why?
Add CLAUDE.md to make AI coding more efficient, CLAUDE.md will make ai
coding with any models more easy
## What does this PR do?
Add CLAUDE.md
## Related issues
This document is inspired by
https://github.com/apache/opendal/blob/main/CLAUDE.md
cc @Xuanwo
## Does this PR introduce any user-facing change?
<!--
If any user-facing interface changes, please [open an
issue](https://github.com/apache/fory/issues/new/choose) describing the
need to do so and update the document if necessary.
Delete section if not applicable.
-->
- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?
## Benchmark
<!--
When the PR has an impact on performance (if you don't know whether the
PR will have an impact on performance, you can submit the PR first, and
if it will have impact on performance, the code reviewer will explain
it), be sure to attach a benchmark data here.
Delete section if not applicable.
-->
---
AGENTS.md | 522 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CLAUDE.md | 1 +
2 files changed, 523 insertions(+)
diff --git a/AGENTS.md b/AGENTS.md
new file mode 100644
index 000000000..9db7e7f83
--- /dev/null
+++ b/AGENTS.md
@@ -0,0 +1,522 @@
+# AGENTS.md
+
+This file provides comprehensive guidance to AI coding agents when working
with the Apache Fory codebase.
+
+## Core Principles
+
+While working on Fory, please remember:
+
+- **Performance First**: Performance is the top priority. Never introduce code
that reduces performance without explicit justification.
+- **English Only**: Always use English in code, comments, and documentation.
+- **Meaningful Comments**: Only add comments when the code's behavior is
difficult to understand or when documenting complex algorithms.
+- **Focused Testing**: Only add tests that verify internal behaviors or fix
specific bugs; don't create unnecessary tests unless requested.
+- **Git-Tracked Files**: When reading code, skip all files not tracked by git
by default unless generated by yourself.
+- **Cross-Language Consistency**: Maintain consistency across language
implementations while respecting language-specific idioms.
+
+## Build and Development Commands
+
+### Java Development
+
+- All maven commands must be executed within the `java` directory.
+- All changes to `java` must pass the code style check and tests.
+- Fory java needs JDK `17+` installed.
+
+```bash
+# Clean the build
+mvn -T16 clean
+
+# Build
+mvn -T16 package
+
+# Install
+mvn -T16 install -DskipTests
+
+# Code format check
+mvn -T16 spotless:check
+
+# Code format
+mvn -T16 spotless:apply
+
+# Code style check
+mvn -T16 checkstyle:check
+
+# Run tests
+mvn -T16 test
+
+# Run specific tests
+mvn -T16 test -Dtest=org.apache.fory.TestClass#testMethod
+```
+
+### C++ Development
+
+- All commands must be executed within the `cpp` directory.
+
+```bash
+# Prepare for build
+pip install pyarrow==15.0.0
+
+# Build C++ library
+bazel build //...
+
+# Run tests
+bazel test $(bazel query //...)
+
+# Run specific test
+bazel test //fory/util:buffer_test
+```
+
+### Python Development
+
+- All commands must be executed within the `python` directory.
+- All changes to `python` must pass the code style check and tests.
+- When running tests, you can use the `ENABLE_FORY_CYTHON_SERIALIZATION`
environment variable to enable or disable cython serialization.
+- When debugging protocol related issues, you should use
`ENABLE_FORY_CYTHON_SERIALIZATION=0` first to verify the behavior.
+- Fory python needs cpython `3.8+` installed although some modules such as
`fory-core` use `java8`.
+
+```bash
+# clean build
+rm -rf build dist .pytest_cache
+bazel clean --expunge
+
+# Code format
+ruff format .
+ruff check --fix .
+
+# Install
+pip install -v -e .
+
+# Build native extension
+bazel build //:cp_fory_so --config=x86_64 # For x86_64
+bazel build //:cp_fory_so --copt=-fsigned-char # For arm64 and aarch64
+
+# Run tests without cython
+ENABLE_FORY_CYTHON_SERIALIZATION=0 pytest -v -s .
+# Run tests with cython
+ENABLE_FORY_CYTHON_SERIALIZATION=1 pytest -v -s .
+```
+
+### Golang Development
+
+- All commands must be executed within the `go/fory` directory.
+- All changes to `go` must pass the format check and tests.
+- Go implementation focuses on reflection-based and codegen-based
serialization.
+
+```bash
+# Format code
+go fmt ./...
+
+# Run tests
+go test -v
+
+# Run tests with race detection
+go test -race -v
+
+# Build
+go build
+
+# Run linter (if golangci-lint is installed)
+golangci-lint run
+
+# Generate code (if using go:generate)
+go generate ./...
+```
+
+### Rust Development
+
+- All cargo commands must be executed within the `rust` directory.
+- All changes to `rust` must pass the clippy check and tests.
+
+```bash
+# Check code
+cargo check
+
+# Build
+cargo build
+
+# Run linter for all services.
+cargo clippy --all-targets --all-features -- -D warnings
+
+# Run tests (requires test features)
+cargo test --features tests
+
+# Format code
+cargo fmt
+
+# Check formatting
+cargo fmt --check
+
+# Build documentation
+cargo doc --lib --no-deps --all-features
+
+# Run benchmarks
+cargo bench
+```
+
+### JavaScript/TypeScript Development
+
+- All commands must be executed within the `javascript` directory.
+- Uses npm/yarn for package management.
+
+```bash
+# Install dependencies
+npm install
+
+# Run tests
+node ./node_modules/.bin/jest --ci --reporters=default --reporters=jest-junit
+
+# Format code
+git ls-files -- '*.ts' | xargs -P 5 node ./node_modules/.bin/eslint
+```
+
+### Dart Development
+
+- All commands must be executed within the `dart` directory.
+- Uses pub for package management.
+
+```bash
+# First, generate necessary code
+dart run build_runner build
+
+# Run all tests
+dart test
+
+# Format code
+dart analyze
+dart fix --dry-run
+dart fix --apply
+```
+
+### Kotlin Development
+
+- All maven commands must be executed within the `kotlin` directory.
+- Kotlin implementation provides extra serializers for kotlin types.
+- Kotlin implementation is built on fory java, please install the java
libraries first by `cd ../java && mvn -T16 install -DskipTests`. If no code
changes after installed fory java, you can skip the installation step.
+
+```bash
+# Build
+mvn clean package
+
+# Run tests
+mvn test
+```
+
+### Scala Development
+
+- All commands must be executed within the `scala` directory.
+- Scala implementation provides extra serializers for Scala types.
+- Scala implementation is built on fory java, please install the java
libraries first by `cd ../java && mvn -T16 install -DskipTests`. If no code
changes after installed fory java, you can skip the installation step.
+
+```bash
+# Build with sbt
+sbt compile
+
+# Run tests
+sbt test
+
+# Format code
+sbt scalafmt
+```
+
+### Integration Tests
+
+- All commands must be executed within the `integration_tests` directory.
+- For java related integration tests, please install the java libraries first
by `cd ../java && mvn -T16 install -DskipTests`. If no code changes after
installed fory java, you can skip the installation step.
+
+```bash
+it_dir=$(pwd)
+# Run graalvm tests
+cd $it_dir/graalvm_tests && mvn -T16 -DskipTests=true -Pnative package &&
target/main
+
+# Run latest_jdk_tests
+cd $it_dir/latest_jdk_tests && mvn -T16 test
+
+# Run JDK compatibility tests
+cd $it_dir/jdk_compatibility_tests && mvn -T16 test
+
+# Run JPMS tests
+cd $it_dir/jpms_tests && mvn -T16 test
+
+# Run Python benchmarks
+cd $it_dir/cpython_benchmark && pip install -r requirements.txt && python
benchmark.py
+```
+
+### Documentation and Formatting
+
+- **Markdown Formatting**: When updating markdown documentation, use `prettier
--write $file` to format.
+- **API Documentation**: When updating important public APIs, update
documentation under `docs/`.
+- **Protocol Specifications**: `docs/specification/**` contains Fory protocol
specifications. Read these documents carefully before making protocol changes.
+- **User Guides**: `docs/guide/**` contains user guides for different features
and languages.
+
+## Repository Structure Understanding
+
+### Key Directories
+
+- **`docs/`**: Documentation, specifications, and guides
+ - `docs/specification/`: Protocol specifications (critical for understanding)
+ - `docs/guide/`: User guides and development guides
+ - `docs/benchmarks/`: Performance benchmarks documentation
+
+- **Language Implementations**:
+ - `java/`: Java implementation (maven-based, multi-module)
+ - `python/`: Python implementation (pip/setuptools + bazel)
+ - `cpp/`: C++ implementation (bazel-based)
+ - `go/`: Go implementation (go modules)
+ - `rust/`: Rust implementation (cargo-based)
+ - `javascript/`: JavaScript/TypeScript implementation (npm-based)
+ - `dart/`: Dart implementation (pub-based)
+ - `kotlin/`: Kotlin implementation (maven-based)
+ - `scala/`: Scala implementation (sbt-based)
+
+- **Testing and CI**:
+ - `integration_tests/`: Cross-language integration tests
+ - `.github/workflows/`: GitHub Actions CI/CD workflows
+ - `ci/`: CI scripts and configurations
+
+- **Build Configuration**:
+ - `BUILD`, `WORKSPACE`: Bazel configuration
+ - `.bazelrc`, `.bazelversion`: Bazel settings
+ - Various `pom.xml`, `package.json`, `Cargo.toml`, etc.
+
+### Important Files
+
+- **`AGENTS.md`**: This file - AI coding guidance
+- **`CLAUDE.md`**: Claude Code specific instructions
+- **`CONTRIBUTING.md`**: Contribution guidelines
+- **`README.md`**: Project overview and quick start
+- **`.gitignore`**: Git ignore patterns (includes build dirs)
+- **`licenserc.toml`**: License header configuration
+
+## Architecture Overview
+
+Apache Fory is a blazingly-fast multi-language serialization framework that
revolutionizes data exchange between systems and languages. By leveraging JIT
compilation, code generation and zero-copy techniques, Fory delivers up to 170x
faster performance compared to other serialization frameworks while being
extremely easy to use.
+
+### Binary Protocols
+
+Fory uses binary protocols for efficient serialization and deserialization.
Fory designed and implemented multiple binary protocols for different scenarios:
+
+- **[xlang serialization
format](docs/specification/xlang_serialization_spec.md)**:
+ - Cross-language serialize any object automatically, no need for IDL
definition, schema compilation and object to/from protocol conversion.
+ - Support optional shared reference and circular reference, no duplicate
data or recursion error.
+ - Support object polymorphism.
+- **[Row format](docs/specification/row_format_spec.md)**: A cache-friendly
binary random access format, supports skipping serialization and partial
serialization, and can convert to column-format automatically.
+- **[Java serialization
format](docs/specification/java_serialization_spec.md)**: Highly-optimized and
drop-in replacement for Java serialization.
+- **Python serialization format**: Highly-optimized and drop-in replacement
for Python pickle, which is an extension built upon **[xlang serialization
format](docs/specification/xlang_serialization_spec.md)**.
+
+**`docs/specification/**` are the specification for the Fory protocol**,
please read those documents carefully and think hard and make sure you
understand them before making changes to code and documentation.
+
+### Core Structure
+
+Fory serialization for every language is implemented independently to minimize
the object memory layout interoperability, object allocation, memory access
cost, thus maximize the performance. There is no code reuse between languages
except for `fory python`, which reused code from `fory c++`.
+
+#### Java
+
+- **fory-core**: Java library implementing the core object graph serialization
+ - `java/fory-core/src/main/java/org/apache/fory/Fory.java`: main
serialization entry point
+ - `java/fory-core/src/main/java/org/apache/fory/resolver/TypeResolver.java`:
type resolution and serializer dispatch
+ - `java/fory-core/src/main/java/org/apache/fory/resolver/RefResolver.java`:
class for resolving shared/circular references when ref tracking is enabled
+ - `java/fory-core/src/main/java/org/apache/fory/serializer`: serializers for
each supported type
+ - `java/fory-core/src/main/java/org/apache/fory/codegen`: code generators,
provide expression abstraction and compile expression tree to java code and
byte code
+ - `java/fory-core/src/main/java/org/apache/fory/builder`: build expression
tree for serialization to generate serialization code
+ - `java/fory-core/src/main/java/org/apache/fory/reflect`: reflection
utilities
+ - `java/fory-core/src/main/java/org/apache/fory/type`: java generics and
type inference utilities
+ - `java/fory-core/src/main/java/org/apache/fory/util`: utility classes
+
+- **fory-format**: Java library implementing the core row format encoding and
decoding
+ - `java/fory-format/src/main/java/org/apache/fory/format/row`: row format
data structures
+ - `java/fory-format/src/main/java/org/apache/fory/format/encoder`: generate
row format encoder and decoder to encode/decode objects to/from row format
+ - `java/fory-format/src/main/java/org/apache/fory/format/type`: type
inference for row format
+ - `java/fory-format/src/main/java/org/apache/fory/format/vectorized`:
interoperation with apache arrow columnar format
+
+- **fory-extensions**: extension libraries for java, including:
+ - Protobuf serializers for fory java native object graph protocol.
+ - Meta compression based on zstd
+
+- **fory-simd**: SIMD-accelerated serialization and deserialization based on
java vector API
+ - `java/fory-simd/src/main/java/org/apache/fory/util`: SIMD utilities
+ - `java/fory-simd/src/main/java/org/apache/fory/serializer`: SIMD
accelerated serializers
+
+- **fory-test-core**: Core test utilities and data generators
+
+- **fory-testsuite**: Complex test suite for issues reported by users and hard
to reproduce using simple test cases
+
+- **benchmark**: Benchmark suite based on jmh
+
+#### Bazel
+
+`bazel` dir provide build support for fory c++ and cython:
+
+- `bazel/arrow`: build rules to get arrow shared libraries based on bazel
template
+- `grpc-cython-copts.patch/grpc-python.patch`: patch for grpc to add
`pyx_library` for cython.
+
+#### C++
+
+- `cpp/fory/row`: Row format data structures
+- `cpp/fory/meta`: Compile-time reflection utilities for extract struct fields
information.
+- `cpp/fory/encoder`: Row format encoder and decoder
+- `cpp/fory/columnar`: Interoperation between fory row format and apache arrow
columnar format
+- `cpp/fory/util`: Common utilities
+ - `cpp/fory/util/buffer.h`: Buffer for reading and writing data
+ - `cpp/fory/util/bit_util.h`: utilities for bit manipulation
+ - `cpp/fory/util/string_util.h`: String utilities
+ - `cpp/fory/util/status.h`: Status code for error handling
+
+#### Python
+
+Fory python has two implementations for the protocol:
+
+- **Python mode**: Pure python implementation based on `xlang serialization
format`, used for debugging and testing only. This mode can be enabled by
setting `ENABLE_FORY_CYTHON_SERIALIZATION=0` environment variable.
+- **Cython mode**: Cython based implementation based on `xlang serialization
format`, which is used by default and has better performance than pure python.
This mode can be enabled by setting `ENABLE_FORY_CYTHON_SERIALIZATION=1`
environment variable.
+- **Python mode** and **Cython mode** reused some code from each other to
reduce code duplication.
+
+Code structure:
+
+- `python/pyfory/_serialization.pyx`: Core serialization logic and entry point
for cython mode based on `xlang serialization format`
+- `python/pyfory/_fory.py`: Serialization entry point for pure python mode
based on `xlang serialization format`
+- `python/pyfory/_registry.py`: Type registry, resolution and serializer
dispatch for pure python mode, which is also used by cython mode. Cython mode
use a cache to reduce invocations to this module.
+- `python/pyfory/serializer.py`: Serializers for non-internal types
+- `python/pyfory/includes`: Cython headers for `c++` functions and classes.
+- `python/pyfory/resolver.py`: resolving shared/circular references when ref
tracking is enabled in pure python mode
+- `python/pyfory/format`: Fory row format encoding and decoding, arrow
columnar format interoperation
+- `python/pyfory/_util.pyx`: Buffer for reading/writing data, string
utilities. Used by `_serialization.pyx` and `python/pyfory/format` at the same
time.
+
+#### Go
+
+Fory go provides reflection-based and codegen-based serialization and
deserialization.
+
+- `go/fory/fory.go`: serialization entry point
+- `go/fory/resolver.go`: resolving shared/circular references when ref
tracking is enabled
+- `go/fory/type.go`: type system and type resolution, serializer dispatch
+- `go/fory/slice.go`: serializers for `slice` type
+- `go/fory/map.go`: serializers for `map` type
+- `go/fory/set.go`: serializers for `set` type
+- `go/fory/struct.go`: serializers for `struct` type
+- `go/fory/string.go`: serializers for `string` type
+- `go/fory/buffer.go`: Buffer for reading/writing data
+- `go/fory/codegen`: code generators, provide code generator to be invoked by
`go:generate` to generate serialization code to speed up the serialization.
+- `go/fory/meta`: Meta string compression
+
+#### Rust
+
+Fory rust provides macro-based serialization and deserialization. Fory rust
consists of:
+
+- **fory**: Main library entry point
+ - `rust/fory/src/lib.rs`: main library entry point to export API to users
+- **fory-core**: Core library for serialization and deserialization
+ - `rust/fory-core/src/fory.rs`: main serialization entry point
+ - `rust/fory-core/src/resolver/type_resolver.rs`: type resolution and
registration
+ - `rust/fory-core/src/resolver/metastring_resolver.rs`: resolver for meta
string
+ - `rust/fory-core/src/resolver/context.rs`: context for reading/writing
+ - `rust/fory-core/src/buffer.rs`: buffer for reading/writing data
+ - `rust/fory-core/src/meta`: meta string compression, type meta encoding
+ - `rust/fory-core/src/serializer`: serializers for each supported type
+ - `rust/fory-core/src/row`: row format encoding and decoding
+- **fory-derive**: Rust macro-based codegen for serialization and
deserialization
+ - `rust/fory-derive/src/object`: macro for serializing/deserializing structs
+ - `rust/fory-derive/src/fory_row`: macro for encoding/decoding row format
+
+#### Integration Tests
+
+`integration_tests` contains integration tests with following modules:
+
+- **cpython_benchmark**: benchmark suite for fory python
+- **graalvm_tests**: test suite for fory java on graalvm
+- **jdk_compatibility_tests**: test suite for fory serialization compatibility
between multiple JDK versions
+- **latest_jdk_tests**: test suite for `jdk17+` versions
+
+## Key Development Guidelines
+
+### Performance Guidelines
+
+- **Performance First**: Never introduce code that reduces performance without
explicit justification
+- **Zero-Copy**: Leverage zero-copy techniques when possible
+- **JIT Compilation**: Consider JIT compilation opportunities
+- **Memory Layout**: Optimize for cache-friendly memory access patterns
+
+### Code Quality
+
+- **Public APIs**: Must be well-documented and easy to understand
+- **Error Handling**: Implement comprehensive error handling with meaningful
messages
+- **Type Safety**: Use strong typing and generics appropriately
+- **Null Safety**: Handle null values appropriately for each language
+
+### Cross-Language Considerations
+
+- **Protocol Compatibility**: Ensure serialization compatibility across
languages
+- **Type Mapping**: Understand type mapping between languages (see
`docs/guide/xlang_type_mapping.md`)
+- **Endianness**: Handle byte order correctly for cross-platform compatibility
+- **Version Compatibility**: Maintain backward compatibility when possible
+
+### Testing Strategy
+
+- **Unit Tests**: Focus on internal behavior verification
+- **Integration Tests**: Use `integration_tests/` for cross-language
compatibility
+- **Langauge alignment and Protocol Compatibility**: Executing
`test_cross_language.py` for language and protocol alignment
+- **Performance Tests**: Include benchmarks for performance-critical changes
+
+### Documentation Requirements
+
+- **API Changes**: Update relevant documentation in `docs/`
+- **Protocol Changes**: Update specifications in `docs/specification/`
+- **Examples**: Provide working examples for new features
+- **Migration Guides**: Document breaking changes and migration paths
+
+## Development Workflow
+
+### Before Making Changes
+
+1. **Read Specifications**: Review relevant docs in `docs/specification/`
+2. **Understand Architecture**: Study the language-specific implementation
structure
+3. **Check Existing Tests**: Look at existing test patterns and coverage
+4. **Review Related Issues**: Check GitHub issues for context
+
+### Making Changes
+
+1. **Follow Language Conventions**: Respect each language's idioms and patterns
+2. **Maintain Performance**: Profile performance-critical changes
+3. **Add Tests**: Include appropriate tests for new functionality
+4. **Update Documentation**: Update docs for API changes
+5. **Format Code**: Use language-specific formatters before committing
+
+## Debugging Guidelines
+
+### Protocol Issues
+
+- **Use Python Mode**: Set `ENABLE_FORY_CYTHON_SERIALIZATION=0` for debugging
+- **Check Specifications**: Refer to protocol specs in `docs/specification/`
+- **Cross-Language Testing**: Use integration tests to verify compatibility
+
+### Performance Issues
+
+- **Profile First**: Use appropriate profilers for each language
+- **Memory Analysis**: Check for memory leaks and allocation patterns
+
+### Build Issues
+
+- **Clean Builds**: Use language-specific clean commands
+- **Dependency Issues**: Check version compatibility
+- **Bazel Issues**: Use `bazel clean --expunge` for deep cleaning
+
+## CI/CD Understanding
+
+### GitHub Actions Workflows
+
+- **`ci.yml`**: Main CI workflow for all languages
+- **`build-native-*.yml`**: Mac/Window python wheel build workflows
+- **`build-containerized-*.yml`**: Containerized python wheel build workflows
for linux
+- **`lint.yml`**: Code formatting and linting
+- **`pr-lint.yml`**: PR-specific checks
+
+## Commit Message Format
+
+Use conventional commits with language scope:
+
+```
+feat(java): add codegen support for xlang serialization
+fix(rust): fix collection header when collection is empty
+docs(python): add docs for xlang serialization
+refactor(java): unify serialization exceptions hierarchy
+perf(cpp): optimize buffer allocation in encoder
+test(integration): add cross-language reference cycle tests
+ci: update build matrix for latest JDK versions
+chore(deps): update arrow dependency to 15.0.0
+```
diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 120000
index 000000000..47dc3e3d8
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1 @@
+AGENTS.md
\ No newline at end of file
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]