This is an automated email from the ASF dual-hosted git repository.

chaokunyang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/fory.git


The following commit(s) were added to refs/heads/main by this push:
     new 512c537fc docs: add AGENT.md to make AI coding more efficient (#2646)
512c537fc is described below

commit 512c537fc093654502e2ce2e0b572a0c6c904b70
Author: Shawn Yang <[email protected]>
AuthorDate: Tue Sep 23 17:50:18 2025 +0800

    docs: add AGENT.md to make AI coding more efficient (#2646)
    
    ## Why?
    
    Add CLAUDE.md to make AI coding more efficient, CLAUDE.md will make ai
    coding with any models more easy
    
    ## What does this PR do?
    Add CLAUDE.md
    
    ## Related issues
    
    This document is inspired by
    https://github.com/apache/opendal/blob/main/CLAUDE.md
    
    cc @Xuanwo
    
    ## Does this PR introduce any user-facing change?
    
    <!--
    If any user-facing interface changes, please [open an
    issue](https://github.com/apache/fory/issues/new/choose) describing the
    need to do so and update the document if necessary.
    
    Delete section if not applicable.
    -->
    
    - [ ] Does this PR introduce any public API change?
    - [ ] Does this PR introduce any binary protocol compatibility change?
    
    ## Benchmark
    
    <!--
    When the PR has an impact on performance (if you don't know whether the
    PR will have an impact on performance, you can submit the PR first, and
    if it will have impact on performance, the code reviewer will explain
    it), be sure to attach a benchmark data here.
    
    Delete section if not applicable.
    -->
---
 AGENTS.md | 522 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 CLAUDE.md |   1 +
 2 files changed, 523 insertions(+)

diff --git a/AGENTS.md b/AGENTS.md
new file mode 100644
index 000000000..9db7e7f83
--- /dev/null
+++ b/AGENTS.md
@@ -0,0 +1,522 @@
+# AGENTS.md
+
+This file provides comprehensive guidance to AI coding agents when working 
with the Apache Fory codebase.
+
+## Core Principles
+
+While working on Fory, please remember:
+
+- **Performance First**: Performance is the top priority. Never introduce code 
that reduces performance without explicit justification.
+- **English Only**: Always use English in code, comments, and documentation.
+- **Meaningful Comments**: Only add comments when the code's behavior is 
difficult to understand or when documenting complex algorithms.
+- **Focused Testing**: Only add tests that verify internal behaviors or fix 
specific bugs; don't create unnecessary tests unless requested.
+- **Git-Tracked Files**: When reading code, skip all files not tracked by git 
by default unless generated by yourself.
+- **Cross-Language Consistency**: Maintain consistency across language 
implementations while respecting language-specific idioms.
+
+## Build and Development Commands
+
+### Java Development
+
+- All maven commands must be executed within the `java` directory.
+- All changes to `java` must pass the code style check and tests.
+- Fory java needs JDK `17+` installed.
+
+```bash
+# Clean the build
+mvn -T16 clean
+
+# Build
+mvn -T16 package
+
+# Install
+mvn -T16 install -DskipTests
+
+# Code format check
+mvn -T16 spotless:check
+
+# Code format
+mvn -T16 spotless:apply
+
+# Code style check
+mvn -T16 checkstyle:check
+
+# Run tests
+mvn -T16 test
+
+# Run specific tests
+mvn -T16 test -Dtest=org.apache.fory.TestClass#testMethod
+```
+
+### C++ Development
+
+- All commands must be executed within the `cpp` directory.
+
+```bash
+# Prepare for build
+pip install pyarrow==15.0.0
+
+# Build C++ library
+bazel build //...
+
+# Run tests
+bazel test $(bazel query //...)
+
+# Run specific test
+bazel test //fory/util:buffer_test
+```
+
+### Python Development
+
+- All commands must be executed within the `python` directory.
+- All changes to `python` must pass the code style check and tests.
+- When running tests, you can use the `ENABLE_FORY_CYTHON_SERIALIZATION` 
environment variable to enable or disable cython serialization.
+- When debugging protocol related issues, you should use 
`ENABLE_FORY_CYTHON_SERIALIZATION=0` first to verify the behavior.
+- Fory python needs cpython `3.8+` installed although some modules such as 
`fory-core` use `java8`.
+
+```bash
+# clean build
+rm -rf build dist .pytest_cache
+bazel clean --expunge
+
+# Code format
+ruff format .
+ruff check --fix .
+
+# Install
+pip install -v -e .
+
+# Build native extension
+bazel build //:cp_fory_so --config=x86_64 # For x86_64
+bazel build //:cp_fory_so --copt=-fsigned-char # For arm64 and aarch64
+
+# Run tests without cython
+ENABLE_FORY_CYTHON_SERIALIZATION=0 pytest -v -s .
+# Run tests with cython
+ENABLE_FORY_CYTHON_SERIALIZATION=1 pytest -v -s .
+```
+
+### Golang Development
+
+- All commands must be executed within the `go/fory` directory.
+- All changes to `go` must pass the format check and tests.
+- Go implementation focuses on reflection-based and codegen-based 
serialization.
+
+```bash
+# Format code
+go fmt ./...
+
+# Run tests
+go test -v
+
+# Run tests with race detection
+go test -race -v
+
+# Build
+go build
+
+# Run linter (if golangci-lint is installed)
+golangci-lint run
+
+# Generate code (if using go:generate)
+go generate ./...
+```
+
+### Rust Development
+
+- All cargo commands must be executed within the `rust` directory.
+- All changes to `rust` must pass the clippy check and tests.
+
+```bash
+# Check code
+cargo check
+
+# Build
+cargo build
+
+# Run linter for all services.
+cargo clippy --all-targets --all-features -- -D warnings
+
+# Run tests (requires test features)
+cargo test --features tests
+
+# Format code
+cargo fmt
+
+# Check formatting
+cargo fmt --check
+
+# Build documentation
+cargo doc --lib --no-deps --all-features
+
+# Run benchmarks
+cargo bench
+```
+
+### JavaScript/TypeScript Development
+
+- All commands must be executed within the `javascript` directory.
+- Uses npm/yarn for package management.
+
+```bash
+# Install dependencies
+npm install
+
+# Run tests
+node ./node_modules/.bin/jest --ci --reporters=default --reporters=jest-junit
+
+# Format code
+git ls-files -- '*.ts' | xargs -P 5 node ./node_modules/.bin/eslint
+```
+
+### Dart Development
+
+- All commands must be executed within the `dart` directory.
+- Uses pub for package management.
+
+```bash
+# First, generate necessary code
+dart run build_runner build
+
+# Run all tests
+dart test
+
+# Format code
+dart analyze
+dart fix --dry-run
+dart fix --apply
+```
+
+### Kotlin Development
+
+- All maven commands must be executed within the `kotlin` directory.
+- Kotlin implementation provides extra serializers for kotlin types.
+- Kotlin implementation is built on fory java, please install the java 
libraries first by `cd ../java && mvn -T16 install -DskipTests`. If no code 
changes after installed fory java, you can skip the installation step.
+
+```bash
+# Build
+mvn clean package
+
+# Run tests
+mvn test
+```
+
+### Scala Development
+
+- All commands must be executed within the `scala` directory.
+- Scala implementation provides extra serializers for Scala types.
+- Scala implementation is built on fory java, please install the java 
libraries first by `cd ../java && mvn -T16 install -DskipTests`. If no code 
changes after installed fory java, you can skip the installation step.
+
+```bash
+# Build with sbt
+sbt compile
+
+# Run tests
+sbt test
+
+# Format code
+sbt scalafmt
+```
+
+### Integration Tests
+
+- All commands must be executed within the `integration_tests` directory.
+- For java related integration tests, please install the java libraries first 
by `cd ../java && mvn -T16 install -DskipTests`. If no code changes after 
installed fory java, you can skip the installation step.
+
+```bash
+it_dir=$(pwd)
+# Run graalvm tests
+cd $it_dir/graalvm_tests && mvn -T16 -DskipTests=true -Pnative package && 
target/main
+
+# Run latest_jdk_tests
+cd $it_dir/latest_jdk_tests && mvn -T16 test
+
+# Run JDK compatibility tests
+cd $it_dir/jdk_compatibility_tests && mvn -T16 test
+
+# Run JPMS tests
+cd $it_dir/jpms_tests && mvn -T16 test
+
+# Run Python benchmarks
+cd $it_dir/cpython_benchmark && pip install -r requirements.txt && python 
benchmark.py
+```
+
+### Documentation and Formatting
+
+- **Markdown Formatting**: When updating markdown documentation, use `prettier 
--write $file` to format.
+- **API Documentation**: When updating important public APIs, update 
documentation under `docs/`.
+- **Protocol Specifications**: `docs/specification/**` contains Fory protocol 
specifications. Read these documents carefully before making protocol changes.
+- **User Guides**: `docs/guide/**` contains user guides for different features 
and languages.
+
+## Repository Structure Understanding
+
+### Key Directories
+
+- **`docs/`**: Documentation, specifications, and guides
+  - `docs/specification/`: Protocol specifications (critical for understanding)
+  - `docs/guide/`: User guides and development guides
+  - `docs/benchmarks/`: Performance benchmarks documentation
+
+- **Language Implementations**:
+  - `java/`: Java implementation (maven-based, multi-module)
+  - `python/`: Python implementation (pip/setuptools + bazel)
+  - `cpp/`: C++ implementation (bazel-based)
+  - `go/`: Go implementation (go modules)
+  - `rust/`: Rust implementation (cargo-based)
+  - `javascript/`: JavaScript/TypeScript implementation (npm-based)
+  - `dart/`: Dart implementation (pub-based)
+  - `kotlin/`: Kotlin implementation (maven-based)
+  - `scala/`: Scala implementation (sbt-based)
+
+- **Testing and CI**:
+  - `integration_tests/`: Cross-language integration tests
+  - `.github/workflows/`: GitHub Actions CI/CD workflows
+  - `ci/`: CI scripts and configurations
+
+- **Build Configuration**:
+  - `BUILD`, `WORKSPACE`: Bazel configuration
+  - `.bazelrc`, `.bazelversion`: Bazel settings
+  - Various `pom.xml`, `package.json`, `Cargo.toml`, etc.
+
+### Important Files
+
+- **`AGENTS.md`**: This file - AI coding guidance
+- **`CLAUDE.md`**: Claude Code specific instructions
+- **`CONTRIBUTING.md`**: Contribution guidelines
+- **`README.md`**: Project overview and quick start
+- **`.gitignore`**: Git ignore patterns (includes build dirs)
+- **`licenserc.toml`**: License header configuration
+
+## Architecture Overview
+
+Apache Fory is a blazingly-fast multi-language serialization framework that 
revolutionizes data exchange between systems and languages. By leveraging JIT 
compilation, code generation and zero-copy techniques, Fory delivers up to 170x 
faster performance compared to other serialization frameworks while being 
extremely easy to use.
+
+### Binary Protocols
+
+Fory uses binary protocols for efficient serialization and deserialization. 
Fory designed and implemented multiple binary protocols for different scenarios:
+
+- **[xlang serialization 
format](docs/specification/xlang_serialization_spec.md)**:
+  - Cross-language serialize any object automatically, no need for IDL 
definition, schema compilation and object to/from protocol conversion.
+  - Support optional shared reference and circular reference, no duplicate 
data or recursion error.
+  - Support object polymorphism.
+- **[Row format](docs/specification/row_format_spec.md)**: A cache-friendly 
binary random access format, supports skipping serialization and partial 
serialization, and can convert to column-format automatically.
+- **[Java serialization 
format](docs/specification/java_serialization_spec.md)**: Highly-optimized and 
drop-in replacement for Java serialization.
+- **Python serialization format**: Highly-optimized and drop-in replacement 
for Python pickle, which is an extension built upon **[xlang serialization 
format](docs/specification/xlang_serialization_spec.md)**.
+
+**`docs/specification/**` are the specification for the Fory protocol**, 
please read those documents carefully and think hard and make sure you 
understand them before making changes to code and documentation.
+
+### Core Structure
+
+Fory serialization for every language is implemented independently to minimize 
the object memory layout interoperability, object allocation, memory access 
cost, thus maximize the performance. There is no code reuse between languages 
except for `fory python`, which reused code from `fory c++`.
+
+#### Java
+
+- **fory-core**: Java library implementing the core object graph serialization
+  - `java/fory-core/src/main/java/org/apache/fory/Fory.java`: main 
serialization entry point
+  - `java/fory-core/src/main/java/org/apache/fory/resolver/TypeResolver.java`: 
type resolution and serializer dispatch
+  - `java/fory-core/src/main/java/org/apache/fory/resolver/RefResolver.java`: 
class for resolving shared/circular references when ref tracking is enabled
+  - `java/fory-core/src/main/java/org/apache/fory/serializer`: serializers for 
each supported type
+  - `java/fory-core/src/main/java/org/apache/fory/codegen`: code generators, 
provide expression abstraction and compile expression tree to java code and 
byte code
+  - `java/fory-core/src/main/java/org/apache/fory/builder`: build expression 
tree for serialization to generate serialization code
+  - `java/fory-core/src/main/java/org/apache/fory/reflect`: reflection 
utilities
+  - `java/fory-core/src/main/java/org/apache/fory/type`: java generics and 
type inference utilities
+  - `java/fory-core/src/main/java/org/apache/fory/util`: utility classes
+
+- **fory-format**: Java library implementing the core row format encoding and 
decoding
+  - `java/fory-format/src/main/java/org/apache/fory/format/row`: row format 
data structures
+  - `java/fory-format/src/main/java/org/apache/fory/format/encoder`: generate 
row format encoder and decoder to encode/decode objects to/from row format
+  - `java/fory-format/src/main/java/org/apache/fory/format/type`: type 
inference for row format
+  - `java/fory-format/src/main/java/org/apache/fory/format/vectorized`: 
interoperation with apache arrow columnar format
+
+- **fory-extensions**: extension libraries for java, including:
+  - Protobuf serializers for fory java native object graph protocol.
+  - Meta compression based on zstd
+
+- **fory-simd**: SIMD-accelerated serialization and deserialization based on 
java vector API
+  - `java/fory-simd/src/main/java/org/apache/fory/util`: SIMD utilities
+  - `java/fory-simd/src/main/java/org/apache/fory/serializer`: SIMD 
accelerated serializers
+
+- **fory-test-core**: Core test utilities and data generators
+
+- **fory-testsuite**: Complex test suite for issues reported by users and hard 
to reproduce using simple test cases
+
+- **benchmark**: Benchmark suite based on jmh
+
+#### Bazel
+
+`bazel` dir provide build support for fory c++ and cython:
+
+- `bazel/arrow`: build rules to get arrow shared libraries based on bazel 
template
+- `grpc-cython-copts.patch/grpc-python.patch`: patch for grpc to add 
`pyx_library` for cython.
+
+#### C++
+
+- `cpp/fory/row`: Row format data structures
+- `cpp/fory/meta`: Compile-time reflection utilities for extract struct fields 
information.
+- `cpp/fory/encoder`: Row format encoder and decoder
+- `cpp/fory/columnar`: Interoperation between fory row format and apache arrow 
columnar format
+- `cpp/fory/util`: Common utilities
+  - `cpp/fory/util/buffer.h`: Buffer for reading and writing data
+  - `cpp/fory/util/bit_util.h`: utilities for bit manipulation
+  - `cpp/fory/util/string_util.h`: String utilities
+  - `cpp/fory/util/status.h`: Status code for error handling
+
+#### Python
+
+Fory python has two implementations for the protocol:
+
+- **Python mode**: Pure python implementation based on `xlang serialization 
format`, used for debugging and testing only. This mode can be enabled by 
setting `ENABLE_FORY_CYTHON_SERIALIZATION=0` environment variable.
+- **Cython mode**: Cython based implementation based on `xlang serialization 
format`, which is used by default and has better performance than pure python. 
This mode can be enabled by setting `ENABLE_FORY_CYTHON_SERIALIZATION=1` 
environment variable.
+- **Python mode** and **Cython mode** reused some code from each other to 
reduce code duplication.
+
+Code structure:
+
+- `python/pyfory/_serialization.pyx`: Core serialization logic and entry point 
for cython mode based on `xlang serialization format`
+- `python/pyfory/_fory.py`: Serialization entry point for pure python mode 
based on `xlang serialization format`
+- `python/pyfory/_registry.py`: Type registry, resolution and serializer 
dispatch for pure python mode, which is also used by cython mode. Cython mode 
use a cache to reduce invocations to this module.
+- `python/pyfory/serializer.py`: Serializers for non-internal types
+- `python/pyfory/includes`: Cython headers for `c++` functions and classes.
+- `python/pyfory/resolver.py`: resolving shared/circular references when ref 
tracking is enabled in pure python mode
+- `python/pyfory/format`: Fory row format encoding and decoding, arrow 
columnar format interoperation
+- `python/pyfory/_util.pyx`: Buffer for reading/writing data, string 
utilities. Used by `_serialization.pyx` and `python/pyfory/format` at the same 
time.
+
+#### Go
+
+Fory go provides reflection-based and codegen-based serialization and 
deserialization.
+
+- `go/fory/fory.go`: serialization entry point
+- `go/fory/resolver.go`: resolving shared/circular references when ref 
tracking is enabled
+- `go/fory/type.go`: type system and type resolution, serializer dispatch
+- `go/fory/slice.go`: serializers for `slice` type
+- `go/fory/map.go`: serializers for `map` type
+- `go/fory/set.go`: serializers for `set` type
+- `go/fory/struct.go`: serializers for `struct` type
+- `go/fory/string.go`: serializers for `string` type
+- `go/fory/buffer.go`: Buffer for reading/writing data
+- `go/fory/codegen`: code generators, provide code generator to be invoked by 
`go:generate` to generate serialization code to speed up the serialization.
+- `go/fory/meta`: Meta string compression
+
+#### Rust
+
+Fory rust provides macro-based serialization and deserialization. Fory rust 
consists of:
+
+- **fory**: Main library entry point
+  - `rust/fory/src/lib.rs`: main library entry point to export API to users
+- **fory-core**: Core library for serialization and deserialization
+  - `rust/fory-core/src/fory.rs`: main serialization entry point
+  - `rust/fory-core/src/resolver/type_resolver.rs`: type resolution and 
registration
+  - `rust/fory-core/src/resolver/metastring_resolver.rs`: resolver for meta 
string
+  - `rust/fory-core/src/resolver/context.rs`: context for reading/writing
+  - `rust/fory-core/src/buffer.rs`: buffer for reading/writing data
+  - `rust/fory-core/src/meta`: meta string compression, type meta encoding
+  - `rust/fory-core/src/serializer`: serializers for each supported type
+  - `rust/fory-core/src/row`: row format encoding and decoding
+- **fory-derive**: Rust macro-based codegen for serialization and 
deserialization
+  - `rust/fory-derive/src/object`: macro for serializing/deserializing structs
+  - `rust/fory-derive/src/fory_row`: macro for encoding/decoding row format
+
+#### Integration Tests
+
+`integration_tests` contains integration tests with following modules:
+
+- **cpython_benchmark**: benchmark suite for fory python
+- **graalvm_tests**: test suite for fory java on graalvm
+- **jdk_compatibility_tests**: test suite for fory serialization compatibility 
between multiple JDK versions
+- **latest_jdk_tests**: test suite for `jdk17+` versions
+
+## Key Development Guidelines
+
+### Performance Guidelines
+
+- **Performance First**: Never introduce code that reduces performance without 
explicit justification
+- **Zero-Copy**: Leverage zero-copy techniques when possible
+- **JIT Compilation**: Consider JIT compilation opportunities
+- **Memory Layout**: Optimize for cache-friendly memory access patterns
+
+### Code Quality
+
+- **Public APIs**: Must be well-documented and easy to understand
+- **Error Handling**: Implement comprehensive error handling with meaningful 
messages
+- **Type Safety**: Use strong typing and generics appropriately
+- **Null Safety**: Handle null values appropriately for each language
+
+### Cross-Language Considerations
+
+- **Protocol Compatibility**: Ensure serialization compatibility across 
languages
+- **Type Mapping**: Understand type mapping between languages (see 
`docs/guide/xlang_type_mapping.md`)
+- **Endianness**: Handle byte order correctly for cross-platform compatibility
+- **Version Compatibility**: Maintain backward compatibility when possible
+
+### Testing Strategy
+
+- **Unit Tests**: Focus on internal behavior verification
+- **Integration Tests**: Use `integration_tests/` for cross-language 
compatibility
+- **Langauge alignment and Protocol Compatibility**: Executing 
`test_cross_language.py` for language and protocol alignment
+- **Performance Tests**: Include benchmarks for performance-critical changes
+
+### Documentation Requirements
+
+- **API Changes**: Update relevant documentation in `docs/`
+- **Protocol Changes**: Update specifications in `docs/specification/`
+- **Examples**: Provide working examples for new features
+- **Migration Guides**: Document breaking changes and migration paths
+
+## Development Workflow
+
+### Before Making Changes
+
+1. **Read Specifications**: Review relevant docs in `docs/specification/`
+2. **Understand Architecture**: Study the language-specific implementation 
structure
+3. **Check Existing Tests**: Look at existing test patterns and coverage
+4. **Review Related Issues**: Check GitHub issues for context
+
+### Making Changes
+
+1. **Follow Language Conventions**: Respect each language's idioms and patterns
+2. **Maintain Performance**: Profile performance-critical changes
+3. **Add Tests**: Include appropriate tests for new functionality
+4. **Update Documentation**: Update docs for API changes
+5. **Format Code**: Use language-specific formatters before committing
+
+## Debugging Guidelines
+
+### Protocol Issues
+
+- **Use Python Mode**: Set `ENABLE_FORY_CYTHON_SERIALIZATION=0` for debugging
+- **Check Specifications**: Refer to protocol specs in `docs/specification/`
+- **Cross-Language Testing**: Use integration tests to verify compatibility
+
+### Performance Issues
+
+- **Profile First**: Use appropriate profilers for each language
+- **Memory Analysis**: Check for memory leaks and allocation patterns
+
+### Build Issues
+
+- **Clean Builds**: Use language-specific clean commands
+- **Dependency Issues**: Check version compatibility
+- **Bazel Issues**: Use `bazel clean --expunge` for deep cleaning
+
+## CI/CD Understanding
+
+### GitHub Actions Workflows
+
+- **`ci.yml`**: Main CI workflow for all languages
+- **`build-native-*.yml`**: Mac/Window python wheel build workflows
+- **`build-containerized-*.yml`**: Containerized python wheel build workflows 
for linux
+- **`lint.yml`**: Code formatting and linting
+- **`pr-lint.yml`**: PR-specific checks
+
+## Commit Message Format
+
+Use conventional commits with language scope:
+
+```
+feat(java): add codegen support for xlang serialization
+fix(rust): fix collection header when collection is empty
+docs(python): add docs for xlang serialization
+refactor(java): unify serialization exceptions hierarchy
+perf(cpp): optimize buffer allocation in encoder
+test(integration): add cross-language reference cycle tests
+ci: update build matrix for latest JDK versions
+chore(deps): update arrow dependency to 15.0.0
+```
diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 120000
index 000000000..47dc3e3d8
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1 @@
+AGENTS.md
\ No newline at end of file


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to