This is an automated email from the ASF dual-hosted git repository. chaokunyang pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/fory-site.git
commit 9792ff6465ad055256e61f2437e3a5dab313778f Author: chaokunyang <[email protected]> AuthorDate: Wed Jan 21 04:57:10 2026 +0000 🔄 created local 'docs/compiler/' from remote 'docs/compiler/' --- docs/compiler/_category_.json | 4 + docs/compiler/compiler-guide.md | 612 ++++++++++++++++++++ docs/compiler/fdl-syntax.md | 1188 +++++++++++++++++++++++++++++++++++++++ docs/compiler/generated-code.md | 828 +++++++++++++++++++++++++++ docs/compiler/index.md | 206 +++++++ docs/compiler/proto-vs-fdl.md | 507 +++++++++++++++++ docs/compiler/type-system.md | 449 +++++++++++++++ 7 files changed, 3794 insertions(+) diff --git a/docs/compiler/_category_.json b/docs/compiler/_category_.json new file mode 100644 index 000000000..c2cd9e2fe --- /dev/null +++ b/docs/compiler/_category_.json @@ -0,0 +1,4 @@ +{ + "position": 3, + "label": "Schema IDL & Compiler" +} diff --git a/docs/compiler/compiler-guide.md b/docs/compiler/compiler-guide.md new file mode 100644 index 000000000..0267a9667 --- /dev/null +++ b/docs/compiler/compiler-guide.md @@ -0,0 +1,612 @@ +--- +title: FDL Compiler Guide +sidebar_position: 4 +id: fdl_compiler_guide +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +This guide covers installation, usage, and integration of the FDL compiler. + +## Installation + +### From Source + +```bash +cd compiler +pip install -e . +``` + +### Verify Installation + +```bash +fory compile --help +``` + +## Command Line Interface + +### Basic Usage + +```bash +fory compile [OPTIONS] FILES... +``` + +### Options + +| Option | Description | Default | +| ------------------------------------- | ----------------------------------------------------- | ------------- | +| `--lang` | Comma-separated target languages | `all` | +| `--output`, `-o` | Output directory | `./generated` | +| `--package` | Override package name from FDL file | (from file) | +| `-I`, `--proto_path`, `--import_path` | Add directory to import search path (can be repeated) | (none) | +| `--java_out=DST_DIR` | Generate Java code in DST_DIR | (none) | +| `--python_out=DST_DIR` | Generate Python code in DST_DIR | (none) | +| `--cpp_out=DST_DIR` | Generate C++ code in DST_DIR | (none) | +| `--go_out=DST_DIR` | Generate Go code in DST_DIR | (none) | +| `--rust_out=DST_DIR` | Generate Rust code in DST_DIR | (none) | +| `--go_nested_type_style` | Go nested type naming: `camelcase` or `underscore` | (none) | + +### Examples + +**Compile for all languages:** + +```bash +fory compile schema.fdl +``` + +**Compile for specific languages:** + +```bash +fory compile schema.fdl --lang java,python +``` + +**Specify output directory:** + +```bash +fory compile schema.fdl --output ./src/generated +``` + +**Override package name:** + +```bash +fory compile schema.fdl --package com.myapp.models +``` + +**Compile multiple files:** + +```bash +fory compile user.fdl order.fdl product.fdl --output ./generated +``` + +**Use import search paths:** + +```bash +# Add a single import path +fory compile src/main.fdl -I libs/common + +# Add multiple import paths (repeated option) +fory compile src/main.fdl -I libs/common -I libs/types + +# Add multiple import paths (comma-separated) +fory compile src/main.fdl -I libs/common,libs/types,third_party/ + +# Using --proto_path (protoc-compatible alias) +fory compile src/main.fdl --proto_path=libs/common + +# Mix all styles +fory compile src/main.fdl -I libs/common,libs/types --proto_path third_party/ +``` + +**Language-specific output directories (protoc-style):** + +```bash +# Generate only Java code to a specific directory +fory compile schema.fdl --java_out=./src/main/java + +# Generate multiple languages to different directories +fory compile schema.fdl --java_out=./java/gen --python_out=./python/src --go_out=./go/gen + +# Combine with import paths +fory compile schema.fdl --java_out=./gen/java -I proto/ -I common/ +``` + +When using `--{lang}_out` options: + +- Only the specified languages are generated (not all languages) +- Files are placed directly in the specified directory (not in a `{lang}/` subdirectory) +- This is compatible with protoc-style workflows + +## Import Path Resolution + +When compiling FDL files with imports, the compiler searches for imported files in this order: + +1. **Relative to the importing file (default)** - The directory containing the file with the import statement is always searched first, automatically. No `-I` flag needed for same-directory imports. +2. **Each `-I` path in order** - Additional search paths specified on the command line + +**Same-directory imports work automatically:** + +```proto +// main.fdl +import "common.fdl"; // Found if common.fdl is in the same directory +``` + +```bash +# No -I needed for same-directory imports +fory compile main.fdl +``` + +**Example project structure:** + +``` +project/ +├── src/ +│ └── main.fdl # import "common.fdl"; +└── libs/ + └── common.fdl +``` + +**Without `-I` (fails):** + +```bash +$ fory compile src/main.fdl +Import error: Import not found: common.fdl + Searched in: /project/src +``` + +**With `-I` (succeeds):** + +```bash +$ fory compile src/main.fdl -I libs/ +Compiling src/main.fdl... + Resolved 1 import(s) +``` + +## Supported Languages + +| Language | Flag | Output Extension | Description | +| -------- | -------- | ---------------- | --------------------------- | +| Java | `java` | `.java` | POJOs with Fory annotations | +| Python | `python` | `.py` | Dataclasses with type hints | +| Go | `go` | `.go` | Structs with struct tags | +| Rust | `rust` | `.rs` | Structs with derive macros | +| C++ | `cpp` | `.h` | Structs with FORY macros | + +## Output Structure + +### Java + +``` +generated/ +└── java/ + └── com/ + └── example/ + ├── User.java + ├── Order.java + ├── Status.java + └── ExampleForyRegistration.java +``` + +- One file per type (enum or message) +- Package structure matches FDL package +- Registration helper class generated + +### Python + +``` +generated/ +└── python/ + └── example.py +``` + +- Single module with all types +- Module name derived from package +- Registration function included + +### Go + +``` +generated/ +└── go/ + └── example.go +``` + +- Single file with all types +- Package name from last component of FDL package +- Registration function included + +### Rust + +``` +generated/ +└── rust/ + └── example.rs +``` + +- Single module with all types +- Module name derived from package +- Registration function included + +### C++ + +``` +generated/ +└── cpp/ + └── example.h +``` + +- Single header file +- Namespace matches package (dots to `::`) +- Header guards and forward declarations + +## Build Integration + +### Maven (Java) + +Add to your `pom.xml`: + +```xml +<build> + <plugins> + <plugin> + <groupId>org.codehaus.mojo</groupId> + <artifactId>exec-maven-plugin</artifactId> + <version>3.1.0</version> + <executions> + <execution> + <id>generate-fory-types</id> + <phase>generate-sources</phase> + <goals> + <goal>exec</goal> + </goals> + <configuration> + <executable>fory</executable> + <arguments> + <argument>compile</argument> + <argument>${project.basedir}/src/main/fdl/schema.fdl</argument> + <argument>--lang</argument> + <argument>java</argument> + <argument>--output</argument> + <argument>${project.build.directory}/generated-sources/fdl</argument> + </arguments> + </configuration> + </execution> + </executions> + </plugin> + </plugins> +</build> +``` + +Add generated sources: + +```xml +<build> + <plugins> + <plugin> + <groupId>org.codehaus.mojo</groupId> + <artifactId>build-helper-maven-plugin</artifactId> + <version>3.4.0</version> + <executions> + <execution> + <phase>generate-sources</phase> + <goals> + <goal>add-source</goal> + </goals> + <configuration> + <sources> + <source>${project.build.directory}/generated-sources/fdl</source> + </sources> + </configuration> + </execution> + </executions> + </plugin> + </plugins> +</build> +``` + +### Gradle (Java/Kotlin) + +Add to `build.gradle`: + +```groovy +task generateForyTypes(type: Exec) { + commandLine 'fory', 'compile', + "${projectDir}/src/main/fdl/schema.fdl", + '--lang', 'java', + '--output', "${buildDir}/generated/sources/fdl" +} + +compileJava.dependsOn generateForyTypes + +sourceSets { + main { + java { + srcDir "${buildDir}/generated/sources/fdl/java" + } + } +} +``` + +### Python (setuptools) + +Add to `setup.py` or `pyproject.toml`: + +```python +# setup.py +from setuptools import setup +from setuptools.command.build_py import build_py +import subprocess + +class BuildWithFdl(build_py): + def run(self): + subprocess.run([ + 'fory', 'compile', + 'schema.fdl', + '--lang', 'python', + '--output', 'src/generated' + ], check=True) + super().run() + +setup( + cmdclass={'build_py': BuildWithFdl}, + # ... +) +``` + +### Go (go generate) + +Add to your Go file: + +```go +//go:generate fory compile ../schema.fdl --lang go --output . +package models +``` + +Run: + +```bash +go generate ./... +``` + +### Rust (build.rs) + +Add to `build.rs`: + +```rust +use std::process::Command; + +fn main() { + println!("cargo:rerun-if-changed=schema.fdl"); + + let status = Command::new("fory") + .args(&["compile", "schema.fdl", "--lang", "rust", "--output", "src/generated"]) + .status() + .expect("Failed to run fory compiler"); + + if !status.success() { + panic!("FDL compilation failed"); + } +} +``` + +### CMake (C++) + +Add to `CMakeLists.txt`: + +```cmake +find_program(FORY_COMPILER fory) + +add_custom_command( + OUTPUT ${CMAKE_CURRENT_SOURCE_DIR}/generated/example.h + COMMAND ${FORY_COMPILER} compile + ${CMAKE_CURRENT_SOURCE_DIR}/schema.fdl + --lang cpp + --output ${CMAKE_CURRENT_SOURCE_DIR}/generated + DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/schema.fdl + COMMENT "Generating FDL types" +) + +add_custom_target(generate_fdl DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/generated/example.h) + +add_library(mylib ...) +add_dependencies(mylib generate_fdl) +target_include_directories(mylib PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/generated) +``` + +### Bazel + +Create a rule in `BUILD`: + +```python +genrule( + name = "generate_fdl", + srcs = ["schema.fdl"], + outs = ["generated/example.h"], + cmd = "$(location //:fory_compiler) compile $(SRCS) --lang cpp --output $(RULEDIR)/generated", + tools = ["//:fory_compiler"], +) + +cc_library( + name = "models", + hdrs = [":generate_fdl"], + # ... +) +``` + +## Error Handling + +### Syntax Errors + +``` +Error: Line 5, Column 12: Expected ';' after field declaration +``` + +Fix: Check the indicated line for missing semicolons or syntax issues. + +### Duplicate Type Names + +``` +Error: Duplicate type name: User +``` + +Fix: Ensure each enum and message has a unique name within the file. + +### Duplicate Type IDs + +``` +Error: Duplicate type ID 100: User and Order +``` + +Fix: Assign unique type IDs to each type. + +### Unknown Type References + +``` +Error: Unknown type 'Address' in Customer.address +``` + +Fix: Define the referenced type before using it, or check for typos. + +### Duplicate Field Numbers + +``` +Error: Duplicate field number 1 in User: name and id +``` + +Fix: Assign unique field numbers within each message. + +## Best Practices + +### Project Structure + +``` +project/ +├── fdl/ +│ ├── common.fdl # Shared types +│ ├── user.fdl # User domain +│ └── order.fdl # Order domain +├── src/ +│ └── generated/ # Generated code (git-ignored) +└── build.gradle +``` + +### Version Control + +- **Track**: FDL schema files +- **Ignore**: Generated code (can be regenerated) + +Add to `.gitignore`: + +``` +# Generated FDL code +src/generated/ +generated/ +``` + +### CI/CD Integration + +Always regenerate during builds: + +```yaml +# GitHub Actions example +steps: + - name: Install FDL Compiler + run: pip install ./compiler + + - name: Generate Types + run: fory compile fdl/*.fdl --output src/generated + + - name: Build + run: ./gradlew build +``` + +### Schema Evolution + +When modifying schemas: + +1. **Never reuse field numbers** - Mark as reserved instead +2. **Never change type IDs** - They're part of the binary format +3. **Add new fields** - Use new field numbers +4. **Use `optional`** - For backward compatibility + +```proto +message User [id=100] { + string id = 1; + string name = 2; + // Field 3 was removed, don't reuse + optional string email = 4; // New field +} +``` + +## Troubleshooting + +### Command Not Found + +``` +fory: command not found +``` + +**Solution:** Ensure the compiler is installed and in your PATH: + +```bash +pip install -e ./compiler +# Or add to PATH +export PATH=$PATH:~/.local/bin +``` + +### Permission Denied + +``` +Permission denied: ./generated +``` + +**Solution:** Ensure write permissions on the output directory: + +```bash +chmod -R u+w ./generated +``` + +### Import Errors in Generated Code + +**Java:** Ensure Fory dependency is in your project: + +```xml +<dependency> + <groupId>org.apache.fory</groupId> + <artifactId>fory-core</artifactId> + <version>0.14.1</version> +</dependency> +``` + +**Python:** Ensure pyfory is installed: + +```bash +pip install pyfory +``` + +**Go:** Ensure fory module is available: + +```bash +go get github.com/apache/fory/go/fory +``` + +**Rust:** Ensure fory crate is in `Cargo.toml`: + +```toml +[dependencies] +fory = "0.13" +``` + +**C++:** Ensure Fory headers are in include path. diff --git a/docs/compiler/fdl-syntax.md b/docs/compiler/fdl-syntax.md new file mode 100644 index 000000000..c496b8cfb --- /dev/null +++ b/docs/compiler/fdl-syntax.md @@ -0,0 +1,1188 @@ +--- +title: FDL Syntax Reference +sidebar_position: 2 +id: fdl_syntax +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +This document provides a complete reference for the Fory Definition Language (FDL) syntax. + +## File Structure + +An FDL file consists of: + +1. Optional package declaration +2. Optional import statements +3. Type definitions (enums and messages) + +```proto +// Optional package declaration +package com.example.models; + +// Import statements +import "common/types.fdl"; + +// Type definitions +enum Color [id=100] { ... } +message User [id=101] { ... } +message Order [id=102] { ... } +``` + +## Comments + +FDL supports both single-line and block comments: + +```proto +// This is a single-line comment + +/* + * This is a block comment + * that spans multiple lines + */ + +message Example { + string name = 1; // Inline comment +} +``` + +## Package Declaration + +The package declaration defines the namespace for all types in the file. + +```proto +package com.example.models; +``` + +**Rules:** + +- Optional but recommended +- Must appear before any type definitions +- Only one package declaration per file +- Used for namespace-based type registration + +**Language Mapping:** + +| Language | Package Usage | +| -------- | --------------------------------- | +| Java | Java package | +| Python | Module name (dots to underscores) | +| Go | Package name (last component) | +| Rust | Module name (dots to underscores) | +| C++ | Namespace (dots to `::`) | + +## File-Level Options + +Options can be specified at file level to control language-specific code generation. + +### Syntax + +```proto +option option_name = value; +``` + +### Java Package Option + +Override the Java package for generated code: + +```proto +package payment; +option java_package = "com.mycorp.payment.v1"; + +message Payment { + string id = 1; +} +``` + +**Effect:** + +- Generated Java files will be in `com/mycorp/payment/v1/` directory +- Java package declaration will be `package com.mycorp.payment.v1;` +- Type registration still uses the FDL package (`payment`) for cross-language compatibility + +### Go Package Option + +Specify the Go import path and package name: + +```proto +package payment; +option go_package = "github.com/mycorp/apis/gen/payment/v1;paymentv1"; + +message Payment { + string id = 1; +} +``` + +**Format:** `"import/path;package_name"` or just `"import/path"` (last segment used as package name) + +**Effect:** + +- Generated Go files will have `package paymentv1` +- The import path can be used in other Go code +- Type registration still uses the FDL package (`payment`) for cross-language compatibility + +### Java Outer Classname Option + +Generate all types as inner classes of a single outer wrapper class: + +```proto +package payment; +option java_outer_classname = "DescriptorProtos"; + +enum Status { + UNKNOWN = 0; + ACTIVE = 1; +} + +message Payment { + string id = 1; + Status status = 2; +} +``` + +**Effect:** + +- Generates a single file `DescriptorProtos.java` instead of separate files +- All enums and messages become `public static` inner classes +- The outer class is `public final` with a private constructor +- Useful for grouping related types together + +**Generated structure:** + +```java +public final class DescriptorProtos { + private DescriptorProtos() {} + + public static enum Status { + UNKNOWN, + ACTIVE; + } + + public static class Payment { + private String id; + private Status status; + // ... + } +} +``` + +**Combined with java_package:** + +```proto +package payment; +option java_package = "com.example.proto"; +option java_outer_classname = "PaymentProtos"; + +message Payment { + string id = 1; +} +``` + +This generates `com/example/proto/PaymentProtos.java` with all types as inner classes. + +### Java Multiple Files Option + +Control whether types are generated in separate files or as inner classes: + +```proto +package payment; +option java_outer_classname = "PaymentProtos"; +option java_multiple_files = true; + +message Payment { + string id = 1; +} + +message Receipt { + string id = 1; +} +``` + +**Behavior:** + +| `java_outer_classname` | `java_multiple_files` | Result | +| ---------------------- | --------------------- | ------------------------------------------- | +| Not set | Any | Separate files (one per type) | +| Set | `false` (default) | Single file with all types as inner classes | +| Set | `true` | Separate files (overrides outer class) | + +**Effect of `java_multiple_files = true`:** + +- Each top-level enum and message gets its own `.java` file +- Overrides `java_outer_classname` behavior +- Useful when you want separate files but still specify an outer class name for other purposes + +**Example without java_multiple_files (default):** + +```proto +option java_outer_classname = "PaymentProtos"; +// Generates: PaymentProtos.java containing Payment and Receipt as inner classes +``` + +**Example with java_multiple_files = true:** + +```proto +option java_outer_classname = "PaymentProtos"; +option java_multiple_files = true; +// Generates: Payment.java, Receipt.java (separate files) +``` + +### Multiple Options + +Multiple options can be specified: + +```proto +package payment; +option java_package = "com.mycorp.payment.v1"; +option go_package = "github.com/mycorp/apis/gen/payment/v1;paymentv1"; +option deprecated = true; + +message Payment { + string id = 1; +} +``` + +### Fory Extension Options + +FDL supports protobuf-style extension options for Fory-specific configuration: + +```proto +option (fory).use_record_for_java_message = true; +option (fory).polymorphism = true; +``` + +**Available File Options:** + +| Option | Type | Description | +| ----------------------------- | ------ | ------------------------------------------------------------ | +| `use_record_for_java_message` | bool | Generate Java records instead of classes | +| `polymorphism` | bool | Enable polymorphism for all types | +| `go_nested_type_style` | string | Go nested type naming: `underscore` (default) or `camelcase` | + +See the [Fory Extension Options](#fory-extension-options) section for complete documentation of message, enum, and field options. + +### Option Priority + +For language-specific packages: + +1. Command-line package override (highest priority) +2. Language-specific option (`java_package`, `go_package`) +3. FDL package declaration (fallback) + +**Example:** + +```proto +package myapp.models; +option java_package = "com.example.generated"; +``` + +| Scenario | Java Package Used | +| ------------------------- | ------------------------- | +| No override | `com.example.generated` | +| CLI: `--package=override` | `override` | +| No java_package option | `myapp.models` (fallback) | + +### Cross-Language Type Registration + +Language-specific options only affect where code is generated, not the type namespace used for serialization. This ensures cross-language compatibility: + +```proto +package myapp.models; +option java_package = "com.mycorp.generated"; +option go_package = "github.com/mycorp/gen;genmodels"; + +message User { + string name = 1; +} +``` + +All languages will register `User` with namespace `myapp.models`, enabling: + +- Java serialized data → Go deserialization +- Go serialized data → Java deserialization +- Any language combination works seamlessly + +## Import Statement + +Import statements allow you to use types defined in other FDL files. + +### Basic Syntax + +```proto +import "path/to/file.fdl"; +``` + +### Multiple Imports + +```proto +import "common/types.fdl"; +import "common/enums.fdl"; +import "models/address.fdl"; +``` + +### Path Resolution + +Import paths are resolved relative to the importing file: + +``` +project/ +├── common/ +│ └── types.fdl +├── models/ +│ ├── user.fdl # import "../common/types.fdl" +│ └── order.fdl # import "../common/types.fdl" +└── main.fdl # import "common/types.fdl" +``` + +**Rules:** + +- Import paths are quoted strings (double or single quotes) +- Paths are resolved relative to the importing file's directory +- Imported types become available as if defined in the current file +- Circular imports are detected and reported as errors +- Transitive imports work (if A imports B and B imports C, A has access to C's types) + +### Complete Example + +**common/types.fdl:** + +```proto +package common; + +enum Status [id=100] { + PENDING = 0; + ACTIVE = 1; + COMPLETED = 2; +} + +message Address [id=101] { + string street = 1; + string city = 2; + string country = 3; +} +``` + +**models/user.fdl:** + +```proto +package models; +import "../common/types.fdl"; + +message User [id=200] { + string id = 1; + string name = 2; + Address home_address = 3; // Uses imported type + Status status = 4; // Uses imported enum +} +``` + +### Unsupported Import Syntax + +The following protobuf import modifiers are **not supported**: + +```proto +// NOT SUPPORTED - will produce an error +import public "other.fdl"; +import weak "other.fdl"; +``` + +**`import public`**: FDL uses a simpler import model. All imported types are available to the importing file only. Re-exporting is not supported. Import each file directly where needed. + +**`import weak`**: FDL requires all imports to be present at compile time. Optional dependencies are not supported. + +### Import Errors + +The compiler reports errors for: + +- **File not found**: The imported file doesn't exist +- **Circular import**: A imports B which imports A (directly or indirectly) +- **Parse errors**: Syntax errors in imported files +- **Unsupported syntax**: `import public` or `import weak` + +## Enum Definition + +Enums define a set of named integer constants. + +### Basic Syntax + +```proto +enum Status { + PENDING = 0; + ACTIVE = 1; + COMPLETED = 2; +} +``` + +### With Type ID + +```proto +enum Status [id=100] { + PENDING = 0; + ACTIVE = 1; + COMPLETED = 2; +} +``` + +### Reserved Values + +Reserve field numbers or names to prevent reuse: + +```proto +enum Status { + reserved 2, 15, 9 to 11, 40 to max; // Reserved numbers + reserved "OLD_STATUS", "DEPRECATED"; // Reserved names + PENDING = 0; + ACTIVE = 1; + COMPLETED = 3; +} +``` + +### Enum Options + +Options can be specified within enums: + +```proto +enum Status { + option deprecated = true; // Allowed + PENDING = 0; + ACTIVE = 1; +} +``` + +**Forbidden Options:** + +- `option allow_alias = true` is **not supported**. Each enum value must have a unique integer. + +### Enum Prefix Stripping + +When enum values use a protobuf-style prefix (enum name in UPPER_SNAKE_CASE), the compiler automatically strips the prefix for languages with scoped enums: + +```proto +// Input with prefix +enum DeviceTier { + DEVICE_TIER_UNKNOWN = 0; + DEVICE_TIER_TIER1 = 1; + DEVICE_TIER_TIER2 = 2; +} +``` + +**Generated code:** + +| Language | Output | Style | +| -------- | ----------------------------------------- | -------------- | +| Java | `UNKNOWN, TIER1, TIER2` | Scoped enum | +| Rust | `Unknown, Tier1, Tier2` | Scoped enum | +| C++ | `UNKNOWN, TIER1, TIER2` | Scoped enum | +| Python | `UNKNOWN, TIER1, TIER2` | Scoped IntEnum | +| Go | `DeviceTierUnknown, DeviceTierTier1, ...` | Unscoped const | + +**Note:** The prefix is only stripped if the remainder is a valid identifier. For example, `DEVICE_TIER_1` is kept unchanged because `1` is not a valid identifier name. + +**Grammar:** + +``` +enum_def := 'enum' IDENTIFIER [type_options] '{' enum_body '}' +type_options := '[' type_option (',' type_option)* ']' +type_option := IDENTIFIER '=' option_value +enum_body := (option_stmt | reserved_stmt | enum_value)* +option_stmt := 'option' IDENTIFIER '=' option_value ';' +reserved_stmt := 'reserved' reserved_items ';' +enum_value := IDENTIFIER '=' INTEGER ';' +``` + +**Rules:** + +- Enum names must be unique within the file +- Enum values must have explicit integer assignments +- Value integers must be unique within the enum (no aliases) +- Type ID (`[id=100]`) is optional but recommended for cross-language use + +**Example with All Features:** + +```proto +// HTTP status code categories +enum HttpCategory [id=200] { + reserved 10 to 20; // Reserved for future use + reserved "UNKNOWN"; // Reserved name + INFORMATIONAL = 1; + SUCCESS = 2; + REDIRECTION = 3; + CLIENT_ERROR = 4; + SERVER_ERROR = 5; +} +``` + +## Message Definition + +Messages define structured data types with typed fields. + +### Basic Syntax + +```proto +message Person { + string name = 1; + int32 age = 2; +} +``` + +### With Type ID + +```proto +message Person [id=101] { + string name = 1; + int32 age = 2; +} +``` + +### Reserved Fields + +Reserve field numbers or names to prevent reuse after removing fields: + +```proto +message User { + reserved 2, 15, 9 to 11; // Reserved field numbers + reserved "old_field", "temp"; // Reserved field names + string id = 1; + string name = 3; +} +``` + +### Message Options + +Options can be specified within messages: + +```proto +message User { + option deprecated = true; + string id = 1; + string name = 2; +} +``` + +**Grammar:** + +``` +message_def := 'message' IDENTIFIER [type_options] '{' message_body '}' +type_options := '[' type_option (',' type_option)* ']' +type_option := IDENTIFIER '=' option_value +message_body := (option_stmt | reserved_stmt | nested_type | field_def)* +nested_type := enum_def | message_def +``` + +## Nested Types + +Messages can contain nested message and enum definitions. This is useful for defining types that are closely related to their parent message. + +### Nested Messages + +```proto +message SearchResponse { + message Result { + string url = 1; + string title = 2; + repeated string snippets = 3; + } + repeated Result results = 1; +} +``` + +### Nested Enums + +```proto +message Container { + enum Status { + STATUS_UNKNOWN = 0; + STATUS_ACTIVE = 1; + STATUS_INACTIVE = 2; + } + Status status = 1; +} +``` + +### Qualified Type Names + +Nested types can be referenced from other messages using qualified names (Parent.Child): + +```proto +message SearchResponse { + message Result { + string url = 1; + string title = 2; + } +} + +message SearchResultCache { + // Reference nested type with qualified name + SearchResponse.Result cached_result = 1; + repeated SearchResponse.Result all_results = 2; +} +``` + +### Deeply Nested Types + +Nesting can be multiple levels deep: + +```proto +message Outer { + message Middle { + message Inner { + string value = 1; + } + Inner inner = 1; + } + Middle middle = 1; +} + +message OtherMessage { + // Reference deeply nested type + Outer.Middle.Inner deep_ref = 1; +} +``` + +### Language-Specific Generation + +| Language | Nested Type Generation | +| -------- | --------------------------------------------------------------------------------- | +| Java | Static inner classes (`SearchResponse.Result`) | +| Python | Nested classes within dataclass | +| Go | Flat structs with underscore (`SearchResponse_Result`, configurable to camelcase) | +| Rust | Nested modules (`search_response::Result`) | +| C++ | Nested classes (`SearchResponse::Result`) | + +**Note:** Go defaults to underscore-separated nested names; set `option (fory).go_nested_type_style = "camelcase";` to use concatenated names. Rust emits nested modules for nested types. + +### Nested Type Rules + +- Nested type names must be unique within their parent message +- Nested types can have their own type IDs +- Type IDs must be globally unique (including nested types) +- Within a message, you can reference nested types by simple name +- From outside, use the qualified name (Parent.Child) + +## Field Definition + +Fields define the properties of a message. + +### Basic Syntax + +```proto +field_type field_name = field_number; +``` + +### With Modifiers + +```proto +optional repeated string tags = 1; // Nullable list +repeated optional string tags = 2; // Elements may be null +ref repeated Node nodes = 3; // Collection tracked as a reference +repeated ref Node nodes = 4; // Elements tracked as references +``` + +**Grammar:** + +``` +field_def := [modifiers] field_type IDENTIFIER '=' INTEGER ';' +modifiers := { 'optional' | 'ref' } ['repeated' { 'optional' | 'ref' }] +field_type := primitive_type | named_type | map_type +``` + +Modifiers before `repeated` apply to the field/collection. Modifiers after +`repeated` apply to list elements. + +### Field Modifiers + +#### `optional` + +Marks the field as nullable: + +```proto +message User { + string name = 1; // Required, non-null + optional string email = 2; // Nullable +} +``` + +**Generated Code:** + +| Language | Non-optional | Optional | +| -------- | ------------------ | ----------------------------------------------- | +| Java | `String name` | `String email` with `@ForyField(nullable=true)` | +| Python | `name: str` | `name: Optional[str]` | +| Go | `Name string` | `Name *string` | +| Rust | `name: String` | `name: Option<String>` | +| C++ | `std::string name` | `std::optional<std::string> name` | + +#### `ref` + +Enables reference tracking for shared/circular references: + +```proto +message Node { + string value = 1; + ref Node parent = 2; // Can point to shared object + repeated ref Node children = 3; +} +``` + +**Use Cases:** + +- Shared objects (same object referenced multiple times) +- Circular references (object graphs with cycles) +- Tree structures with parent pointers + +**Generated Code:** + +| Language | Without `ref` | With `ref` | +| -------- | -------------- | ----------------------------------------- | +| Java | `Node parent` | `Node parent` with `@ForyField(ref=true)` | +| Python | `parent: Node` | `parent: Node = pyfory.field(ref=True)` | +| Go | `Parent Node` | `Parent *Node` with `fory:"ref"` | +| Rust | `parent: Node` | `parent: Arc<Node>` | +| C++ | `Node parent` | `std::shared_ptr<Node> parent` | + +#### `repeated` + +Marks the field as a list/array: + +```proto +message Document { + repeated string tags = 1; + repeated User authors = 2; +} +``` + +**Generated Code:** + +| Language | Type | +| -------- | -------------------------- | +| Java | `List<String>` | +| Python | `List[str]` | +| Go | `[]string` | +| Rust | `Vec<String>` | +| C++ | `std::vector<std::string>` | + +### Combining Modifiers + +Modifiers can be combined: + +```proto +message Example { + optional repeated string tags = 1; // Nullable list + repeated optional string aliases = 2; // Elements may be null + ref repeated Node nodes = 3; // Collection tracked as a reference + repeated ref Node children = 4; // Elements tracked as references + optional ref User owner = 5; // Nullable tracked reference +} +``` + +Modifiers before `repeated` apply to the field/collection. Modifiers after +`repeated` apply to elements. + +## Type System + +### Primitive Types + +| Type | Description | Size | +| ----------- | --------------------------- | -------- | +| `bool` | Boolean value | 1 byte | +| `int8` | Signed 8-bit integer | 1 byte | +| `int16` | Signed 16-bit integer | 2 bytes | +| `int32` | Signed 32-bit integer | 4 bytes | +| `int64` | Signed 64-bit integer | 8 bytes | +| `float32` | 32-bit floating point | 4 bytes | +| `float64` | 64-bit floating point | 8 bytes | +| `string` | UTF-8 string | Variable | +| `bytes` | Binary data | Variable | +| `date` | Calendar date | Variable | +| `timestamp` | Date and time with timezone | Variable | + +See [Type System](type-system.md) for complete type mappings. + +### Named Types + +Reference other messages or enums by name: + +```proto +enum Status { ... } +message User { ... } + +message Order { + User customer = 1; // Reference to User message + Status status = 2; // Reference to Status enum +} +``` + +### Map Types + +Maps with typed keys and values: + +```proto +message Config { + map<string, string> properties = 1; + map<string, int32> counts = 2; + map<int32, User> users = 3; +} +``` + +**Syntax:** `map<KeyType, ValueType>` + +**Restrictions:** + +- Key type should be a primitive type (typically `string` or integer types) +- Value type can be any type including messages + +## Field Numbers + +Each field must have a unique positive integer identifier: + +```proto +message Example { + string first = 1; + string second = 2; + string third = 3; +} +``` + +**Rules:** + +- Must be unique within a message +- Must be positive integers +- Used for field ordering and identification +- Gaps in numbering are allowed (useful for deprecating fields) + +**Best Practices:** + +- Use sequential numbers starting from 1 +- Reserve number ranges for different categories +- Never reuse numbers for different fields (even after deletion) + +## Type IDs + +Type IDs enable efficient cross-language serialization: + +```proto +enum Color [id=100] { ... } +message User [id=101] { ... } +message Order [id=102] { ... } +``` + +### With Type ID (Recommended) + +```proto +message User [id=101] { ... } +message User [id=101, deprecated=true] { ... } // Multiple options +``` + +- Serialized as compact integer +- Fast lookup during deserialization +- Must be globally unique across all types +- Recommended for production use + +### Without Type ID + +```proto +message Config { ... } +``` + +- Registered using namespace + name +- More flexible for development +- Slightly larger serialized size +- Uses package as namespace: `"package.Config"` + +### ID Assignment Strategy + +```proto +// Enums: 100-199 +enum Status [id=100] { ... } +enum Priority [id=101] { ... } + +// User domain: 200-299 +message User [id=200] { ... } +message UserProfile [id=201] { ... } + +// Order domain: 300-399 +message Order [id=300] { ... } +message OrderItem [id=301] { ... } +``` + +## Complete Example + +```proto +// E-commerce domain model +package com.shop.models; + +// Enums with type IDs +enum OrderStatus [id=100] { + PENDING = 0; + CONFIRMED = 1; + SHIPPED = 2; + DELIVERED = 3; + CANCELLED = 4; +} + +enum PaymentMethod [id=101] { + CREDIT_CARD = 0; + DEBIT_CARD = 1; + PAYPAL = 2; + BANK_TRANSFER = 3; +} + +// Messages with type IDs +message Address [id=200] { + string street = 1; + string city = 2; + string state = 3; + string country = 4; + string postal_code = 5; +} + +message Customer [id=201] { + string id = 1; + string name = 2; + optional string email = 3; + optional string phone = 4; + optional Address billing_address = 5; + optional Address shipping_address = 6; +} + +message Product [id=202] { + string sku = 1; + string name = 2; + string description = 3; + float64 price = 4; + int32 stock = 5; + repeated string categories = 6; + map<string, string> attributes = 7; +} + +message OrderItem [id=203] { + ref Product product = 1; // Track reference to avoid duplication + int32 quantity = 2; + float64 unit_price = 3; +} + +message Order [id=204] { + string id = 1; + ref Customer customer = 2; + repeated OrderItem items = 3; + OrderStatus status = 4; + PaymentMethod payment_method = 5; + float64 total = 6; + optional string notes = 7; + timestamp created_at = 8; + optional timestamp shipped_at = 9; +} + +// Config without type ID (uses namespace registration) +message ShopConfig { + string store_name = 1; + string currency = 2; + float64 tax_rate = 3; + repeated string supported_countries = 4; +} +``` + +## Fory Extension Options + +FDL supports protobuf-style extension options for Fory-specific configuration. These use the `(fory)` prefix to indicate they are Fory extensions. + +### File-Level Fory Options + +```proto +option (fory).use_record_for_java_message = true; +option (fory).polymorphism = true; +``` + +| Option | Type | Description | +| ----------------------------- | ---- | ---------------------------------------- | +| `use_record_for_java_message` | bool | Generate Java records instead of classes | +| `polymorphism` | bool | Enable polymorphism for all types | + +### Message-Level Fory Options + +Options can be specified inside the message body: + +```proto +message MyMessage { + option (fory).id = 100; + option (fory).evolving = false; + option (fory).use_record_for_java = true; + string name = 1; +} +``` + +| Option | Type | Description | +| --------------------- | ------ | ----------------------------------------------------------------------------------- | +| `id` | int | Type ID for serialization (sets type_id) | +| `evolving` | bool | Schema evolution support (default: true). When false, schema is fixed like a struct | +| `use_record_for_java` | bool | Generate Java record for this message | +| `deprecated` | bool | Mark this message as deprecated | +| `namespace` | string | Custom namespace for type registration | + +**Note:** `option (fory).id = 100` is equivalent to the inline syntax `message MyMessage [id=100]`. + +### Enum-Level Fory Options + +```proto +enum Status { + option (fory).id = 101; + option (fory).deprecated = true; + UNKNOWN = 0; + ACTIVE = 1; +} +``` + +| Option | Type | Description | +| ------------ | ---- | ---------------------------------------- | +| `id` | int | Type ID for serialization (sets type_id) | +| `deprecated` | bool | Mark this enum as deprecated | + +### Field-Level Fory Options + +Field options are specified in brackets after the field number: + +```proto +message Example { + MyType friend = 1 [(fory).ref = true]; + string nickname = 2 [(fory).nullable = true]; + MyType data = 3 [(fory).ref = true, (fory).nullable = true]; +} +``` + +| Option | Type | Description | +| --------------------- | ---- | --------------------------------------------------------- | +| `ref` | bool | Enable reference tracking (sets ref flag) | +| `nullable` | bool | Mark field as nullable (sets optional flag) | +| `deprecated` | bool | Mark this field as deprecated | +| `thread_safe_pointer` | bool | Rust only: use `Arc` (true) or `Rc` (false) for ref types | + +**Note:** `[(fory).ref = true]` is equivalent to using the `ref` modifier: `ref MyType friend = 1;` +Field-level options always apply to the field/collection; use modifiers after +`repeated` to control element behavior. + +To use `Rc` instead of `Arc` in Rust for a specific field: + +```proto +message Graph { + ref Node root = 1 [(fory).thread_safe_pointer = false]; +} +``` + +### Combining Standard and Fory Options + +You can combine standard options with Fory extension options: + +```proto +message User { + option deprecated = true; // Standard option + option (fory).evolving = false; // Fory extension option + + string name = 1; + MyType data = 2 [deprecated = true, (fory).ref = true]; +} +``` + +### Fory Options Proto File + +For reference, the Fory options are defined in `extension/fory_options.proto`: + +```proto +// File-level options +extend google.protobuf.FileOptions { + optional ForyFileOptions fory = 50001; +} + +message ForyFileOptions { + optional bool use_record_for_java_message = 1; + optional bool polymorphism = 2; +} + +// Message-level options +extend google.protobuf.MessageOptions { + optional ForyMessageOptions fory = 50001; +} + +message ForyMessageOptions { + optional int32 id = 1; + optional bool evolving = 2; + optional bool use_record_for_java = 3; + optional bool deprecated = 4; + optional string namespace = 5; +} + +// Field-level options +extend google.protobuf.FieldOptions { + optional ForyFieldOptions fory = 50001; +} + +message ForyFieldOptions { + optional bool ref = 1; + optional bool nullable = 2; + optional bool deprecated = 3; +} +``` + +## Grammar Summary + +``` +file := [package_decl] file_option* import_decl* type_def* + +package_decl := 'package' package_name ';' +package_name := IDENTIFIER ('.' IDENTIFIER)* + +file_option := 'option' option_name '=' option_value ';' +option_name := IDENTIFIER | extension_name +extension_name := '(' IDENTIFIER ')' '.' IDENTIFIER // e.g., (fory).polymorphism + +import_decl := 'import' STRING ';' + +type_def := enum_def | message_def + +enum_def := 'enum' IDENTIFIER [type_options] '{' enum_body '}' +enum_body := (option_stmt | reserved_stmt | enum_value)* +enum_value := IDENTIFIER '=' INTEGER ';' + +message_def := 'message' IDENTIFIER [type_options] '{' message_body '}' +message_body := (option_stmt | reserved_stmt | nested_type | field_def)* +nested_type := enum_def | message_def +field_def := [modifiers] field_type IDENTIFIER '=' INTEGER [field_options] ';' + +option_stmt := 'option' option_name '=' option_value ';' +option_value := 'true' | 'false' | IDENTIFIER | INTEGER | STRING + +reserved_stmt := 'reserved' reserved_items ';' +reserved_items := reserved_item (',' reserved_item)* +reserved_item := INTEGER | INTEGER 'to' INTEGER | INTEGER 'to' 'max' | STRING + +modifiers := { 'optional' | 'ref' } ['repeated' { 'optional' | 'ref' }] + +field_type := primitive_type | named_type | map_type +primitive_type := 'bool' | 'int8' | 'int16' | 'int32' | 'int64' + | 'float32' | 'float64' | 'string' | 'bytes' + | 'date' | 'timestamp' +named_type := qualified_name +qualified_name := IDENTIFIER ('.' IDENTIFIER)* // e.g., Parent.Child +map_type := 'map' '<' field_type ',' field_type '>' + +type_options := '[' type_option (',' type_option)* ']' +type_option := IDENTIFIER '=' option_value // e.g., id=100, deprecated=true +field_options := '[' field_option (',' field_option)* ']' +field_option := option_name '=' option_value // e.g., deprecated=true, (fory).ref=true + +STRING := '"' [^"\n]* '"' | "'" [^'\n]* "'" +IDENTIFIER := [a-zA-Z_][a-zA-Z0-9_]* +INTEGER := '-'? [0-9]+ +``` diff --git a/docs/compiler/generated-code.md b/docs/compiler/generated-code.md new file mode 100644 index 000000000..617a3421c --- /dev/null +++ b/docs/compiler/generated-code.md @@ -0,0 +1,828 @@ +--- +title: Generated Code Reference +sidebar_position: 5 +id: fdl_generated_code +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +This document explains the code generated by the FDL compiler for each target language. + +## Example Schema + +The examples in this document use this FDL schema: + +```proto +package demo; + +enum Status [id=100] { + PENDING = 0; + ACTIVE = 1; + COMPLETED = 2; +} + +message User [id=101] { + string id = 1; + string name = 2; + optional string email = 3; + int32 age = 4; +} + +message Order [id=102] { + string id = 1; + ref User customer = 2; + repeated string items = 3; + map<string, int32> quantities = 4; + Status status = 5; +} +``` + +## Enum Prefix Stripping + +When enum values use a protobuf-style prefix (enum name in UPPER_SNAKE_CASE), the compiler automatically strips the prefix for languages with scoped enums. This produces cleaner, more idiomatic code. + +**Input FDL:** + +```proto +enum DeviceTier { + DEVICE_TIER_UNKNOWN = 0; + DEVICE_TIER_TIER1 = 1; + DEVICE_TIER_TIER2 = 2; +} +``` + +**Generated output by language:** + +| Language | Generated Values | Notes | +| -------- | ----------------------------------------- | ------------------------- | +| Java | `UNKNOWN, TIER1, TIER2` | Scoped enum | +| Rust | `Unknown, Tier1, Tier2` | PascalCase variants | +| C++ | `UNKNOWN, TIER1, TIER2` | Scoped `enum class` | +| Python | `UNKNOWN, TIER1, TIER2` | Scoped `IntEnum` | +| Go | `DeviceTierUnknown, DeviceTierTier1, ...` | Unscoped, prefix re-added | + +**Note:** Go uses unscoped constants, so the enum name prefix is added back to avoid naming collisions. + +## Nested Types + +When using nested message and enum definitions, the generated code varies by language. + +**Input FDL:** + +```proto +message SearchResponse { + message Result { + string url = 1; + string title = 2; + } + repeated Result results = 1; +} +``` + +### Java - Inner Classes + +```java +public class SearchResponse { + public static class Result { + private String url; + private String title; + // getters, setters... + } + + private List<Result> results; + // getters, setters... +} +``` + +### Python - Nested Classes + +```python +@dataclass +class SearchResponse: + @dataclass + class Result: + url: str = "" + title: str = "" + + results: List[Result] = field(default_factory=list) +``` + +### Go - Underscore + +```go +type SearchResponse_Result struct { + Url string + Title string +} + +type SearchResponse struct { + Results []SearchResponse_Result +} +``` + +**Note:** Set `option (fory).go_nested_type_style = "camelcase";` to generate `SearchResponseResult` instead. + +### Rust - Nested Module + +```rust +pub mod search_response { + use super::*; + + #[derive(ForyObject)] + pub struct Result { + pub url: String, + pub title: String, + } +} + +#[derive(ForyObject)] +pub struct SearchResponse { + pub results: Vec<search_response::Result>, +} +``` + +### C++ - Nested Classes + +```cpp +class SearchResponse final { + public: + class Result final { + public: + std::string url; + std::string title; + }; + + std::vector<Result> results; +}; + +FORY_STRUCT(SearchResponse::Result, url, title); +FORY_STRUCT(SearchResponse, results); +``` + +**Summary:** + +| Language | Approach | Syntax Example | +| -------- | ------------------------- | ------------------------- | +| Java | Static inner classes | `SearchResponse.Result` | +| Python | Nested dataclasses | `SearchResponse.Result` | +| Go | Underscore (configurable) | `SearchResponse_Result` | +| Rust | Nested module | `search_response::Result` | +| C++ | Nested classes | `SearchResponse::Result` | + +## Java + +### Enum Generation + +```java +package demo; + +public enum Status { + PENDING, + ACTIVE, + COMPLETED; +} +``` + +### Message Generation + +```java +package demo; + +import java.util.List; +import java.util.Map; +import org.apache.fory.annotation.ForyField; + +public class User { + private String id; + private String name; + + @ForyField(nullable = true) + private String email; + + private int age; + + public User() { + } + + public String getId() { + return id; + } + + public void setId(String id) { + this.id = id; + } + + public String getName() { + return name; + } + + public void setName(String name) { + this.name = name; + } + + public String getEmail() { + return email; + } + + public void setEmail(String email) { + this.email = email; + } + + public int getAge() { + return age; + } + + public void setAge(int age) { + this.age = age; + } +} +``` + +```java +package demo; + +import java.util.List; +import java.util.Map; +import org.apache.fory.annotation.ForyField; + +public class Order { + private String id; + + @ForyField(ref = true) + private User customer; + + private List<String> items; + private Map<String, Integer> quantities; + private Status status; + + public Order() { + } + + // Getters and setters... +} +``` + +### Registration Helper + +```java +package demo; + +import org.apache.fory.Fory; + +public class DemoForyRegistration { + + public static void register(Fory fory) { + fory.register(Status.class, 100); + fory.register(User.class, 101); + fory.register(Order.class, 102); + } +} +``` + +### Usage + +```java +import demo.*; +import org.apache.fory.Fory; +import org.apache.fory.config.Language; + +public class Example { + public static void main(String[] args) { + Fory fory = Fory.builder() + .withLanguage(Language.XLANG) + .withRefTracking(true) + .build(); + + DemoForyRegistration.register(fory); + + User user = new User(); + user.setId("u123"); + user.setName("Alice"); + user.setAge(30); + + Order order = new Order(); + order.setId("o456"); + order.setCustomer(user); + order.setStatus(Status.ACTIVE); + + byte[] bytes = fory.serialize(order); + Order restored = (Order) fory.deserialize(bytes); + } +} +``` + +## Python + +### Module Generation + +```python +# Licensed to the Apache Software Foundation (ASF)... + +from dataclasses import dataclass +from enum import IntEnum +from typing import Dict, List, Optional +import pyfory + + +class Status(IntEnum): + PENDING = 0 + ACTIVE = 1 + COMPLETED = 2 + + +@dataclass +class User: + id: str = "" + name: str = "" + email: Optional[str] = None + age: pyfory.int32 = 0 + + +@dataclass +class Order: + id: str = "" + customer: Optional[User] = None + items: List[str] = None + quantities: Dict[str, pyfory.int32] = None + status: Status = None + + +def register_demo_types(fory: pyfory.Fory): + fory.register_type(Status, type_id=100) + fory.register_type(User, type_id=101) + fory.register_type(Order, type_id=102) +``` + +### Usage + +```python +import pyfory +from demo import User, Order, Status, register_demo_types + +fory = pyfory.Fory(ref_tracking=True) +register_demo_types(fory) + +user = User(id="u123", name="Alice", age=30) +order = Order( + id="o456", + customer=user, + items=["item1", "item2"], + quantities={"item1": 2, "item2": 1}, + status=Status.ACTIVE +) + +data = fory.serialize(order) +restored = fory.deserialize(data) +``` + +## Go + +### File Generation + +```go +// Licensed to the Apache Software Foundation (ASF)... + +package demo + +import ( + fory "github.com/apache/fory/go/fory" +) + +type Status int32 + +const ( + StatusPending Status = 0 + StatusActive Status = 1 + StatusCompleted Status = 2 +) + +type User struct { + Id string + Name string + Email *string `fory:"nullable"` + Age int32 +} + +type Order struct { + Id string + Customer *User `fory:"ref"` + Items []string + Quantities map[string]int32 + Status Status +} + +func RegisterTypes(f *fory.Fory) error { + if err := f.RegisterEnum(Status(0), 100); err != nil { + return err + } + if err := f.Register(User{}, 101); err != nil { + return err + } + if err := f.Register(Order{}, 102); err != nil { + return err + } + return nil +} +``` + +### Usage + +```go +package main + +import ( + "demo" + fory "github.com/apache/fory/go/fory" +) + +func main() { + f := fory.NewFory(true) // Enable ref tracking + + if err := demo.RegisterTypes(f); err != nil { + panic(err) + } + + email := "[email protected]" + user := &demo.User{ + Id: "u123", + Name: "Alice", + Email: &email, + Age: 30, + } + + order := &demo.Order{ + Id: "o456", + Customer: user, + Items: []string{"item1", "item2"}, + Quantities: map[string]int32{ + "item1": 2, + "item2": 1, + }, + Status: demo.StatusActive, + } + + bytes, err := f.Marshal(order) + if err != nil { + panic(err) + } + + var restored demo.Order + if err := f.Unmarshal(bytes, &restored); err != nil { + panic(err) + } +} +``` + +## Rust + +### Module Generation + +```rust +// Licensed to the Apache Software Foundation (ASF)... + +use fory::{Fory, ForyObject}; +use std::collections::HashMap; +use std::sync::Arc; + +#[derive(ForyObject, Debug, Clone, PartialEq, Default)] +#[repr(i32)] +pub enum Status { + #[default] + Pending = 0, + Active = 1, + Completed = 2, +} + +#[derive(ForyObject, Debug, Clone, PartialEq, Default)] +pub struct User { + pub id: String, + pub name: String, + #[fory(nullable = true)] + pub email: Option<String>, + pub age: i32, +} + +#[derive(ForyObject, Debug, Clone, PartialEq, Default)] +pub struct Order { + pub id: String, + pub customer: Arc<User>, + pub items: Vec<String>, + pub quantities: HashMap<String, i32>, + pub status: Status, +} + +pub fn register_types(fory: &mut Fory) -> Result<(), fory::Error> { + fory.register::<Status>(100)?; + fory.register::<User>(101)?; + fory.register::<Order>(102)?; + Ok(()) +} +``` + +**Note:** Rust uses `Arc` by default for `ref` fields. Set +`[(fory).thread_safe_pointer = false]` to generate `Rc` instead. + +### Usage + +```rust +use demo::{User, Order, Status, register_types}; +use fory::Fory; +use std::sync::Arc; +use std::collections::HashMap; + +fn main() -> Result<(), fory::Error> { + let mut fory = Fory::default(); + register_types(&mut fory)?; + + let user = Arc::new(User { + id: "u123".to_string(), + name: "Alice".to_string(), + email: Some("[email protected]".to_string()), + age: 30, + }); + + let mut quantities = HashMap::new(); + quantities.insert("item1".to_string(), 2); + quantities.insert("item2".to_string(), 1); + + let order = Order { + id: "o456".to_string(), + customer: user, + items: vec!["item1".to_string(), "item2".to_string()], + quantities, + status: Status::Active, + }; + + let bytes = fory.serialize(&order)?; + let restored: Order = fory.deserialize(&bytes)?; + + Ok(()) +} +``` + +## C++ + +### Header Generation + +```cpp +/* + * Licensed to the Apache Software Foundation (ASF)... + */ + +#ifndef DEMO_H_ +#define DEMO_H_ + +#include <cstdint> +#include <map> +#include <memory> +#include <optional> +#include <string> +#include <vector> +#include "fory/serialization/fory.h" + +namespace demo { + +struct User; +struct Order; + +enum class Status : int32_t { + PENDING = 0, + ACTIVE = 1, + COMPLETED = 2, +}; +FORY_ENUM(Status, PENDING, ACTIVE, COMPLETED); + +struct User { + std::string id; + std::string name; + std::optional<std::string> email; + int32_t age; + + bool operator==(const User& other) const { + return id == other.id && name == other.name && + email == other.email && age == other.age; + } +}; +FORY_STRUCT(User, id, name, email, age); + +struct Order { + std::string id; + std::shared_ptr<User> customer; + std::vector<std::string> items; + std::map<std::string, int32_t> quantities; + Status status; + + bool operator==(const Order& other) const { + return id == other.id && customer == other.customer && + items == other.items && quantities == other.quantities && + status == other.status; + } +}; +FORY_STRUCT(Order, id, customer, items, quantities, status); + +inline void RegisterTypes(fory::serialization::Fory& fory) { + fory.register_enum<Status>(100); + fory.register_struct<User>(101); + fory.register_struct<Order>(102); +} + +} // namespace demo + +#endif // DEMO_H_ +``` + +### Usage + +```cpp +#include "demo.h" +#include <iostream> + +int main() { + fory::serialization::Fory fory = fory::serialization::Fory::builder() + .xlang(true) + .ref_tracking(true) + .build(); + + demo::RegisterTypes(fory); + + auto user = std::make_shared<demo::User>(); + user->id = "u123"; + user->name = "Alice"; + user->email = "[email protected]"; + user->age = 30; + + demo::Order order; + order.id = "o456"; + order.customer = user; + order.items = {"item1", "item2"}; + order.quantities = {{"item1", 2}, {"item2", 1}}; + order.status = demo::Status::ACTIVE; + + auto bytes = fory.serialize(order); + auto restored = fory.deserialize<demo::Order>(bytes); + + return 0; +} +``` + +## Generated Annotations Summary + +### Java Annotations + +| Annotation | Purpose | +| ----------------------------- | -------------------------- | +| `@ForyField(nullable = true)` | Marks field as nullable | +| `@ForyField(ref = true)` | Enables reference tracking | + +### Python Type Hints + +| Hint | Purpose | +| -------------- | ------------------- | +| `Optional[T]` | Nullable field | +| `List[T]` | Repeated field | +| `Dict[K, V]` | Map field | +| `pyfory.int32` | Fixed-width integer | + +### Go Struct Tags + +| Tag | Purpose | +| ----------------- | -------------------------- | +| `fory:"nullable"` | Marks field as nullable | +| `fory:"ref"` | Enables reference tracking | + +### Rust Attributes + +| Attribute | Purpose | +| -------------------------- | -------------------------- | +| `#[derive(ForyObject)]` | Enables Fory serialization | +| `#[fory(nullable = true)]` | Marks field as nullable | +| `#[tag("...")]` | Name-based registration | +| `#[repr(i32)]` | Enum representation | + +### C++ Macros + +| Macro | Purpose | +| ---------------------------- | ----------------------- | +| `FORY_STRUCT(T[, fields..])` | Registers struct fields | +| `FORY_ENUM(T, values..)` | Registers enum values | + +## Name-Based Registration + +When types don't have explicit type IDs, they use namespace-based registration: + +### FDL + +```proto +package myapp.models; + +message Config { // No @id + string key = 1; + string value = 2; +} +``` + +### Generated Registration + +**Java:** + +```java +fory.register(Config.class, "myapp.models", "Config"); +``` + +**Python:** + +```python +fory.register_type(Config, namespace="myapp.models", typename="Config") +``` + +**Go:** + +```go +f.RegisterTagType("myapp.models.Config", Config{}) +``` + +**Rust:** + +```rust +#[derive(ForyObject)] +#[tag("myapp.models.Config")] +pub struct Config { ... } +``` + +**C++:** + +```cpp +fory.register_struct<Config>("myapp.models", "Config"); +``` + +## Customization + +### Extending Generated Code + +Generated code can be extended through language-specific mechanisms: + +**Java:** Use inheritance or composition: + +```java +public class ExtendedUser extends User { + public String getDisplayName() { + return getName() + " <" + getEmail() + ">"; + } +} +``` + +**Python:** Add methods after import: + +```python +from demo import User + +def get_display_name(self): + return f"{self.name} <{self.email}>" + +User.get_display_name = get_display_name +``` + +**Go:** Use separate file in same package: + +```go +package demo + +func (u *User) DisplayName() string { + return u.Name + " <" + *u.Email + ">" +} +``` + +**Rust:** Use trait extensions: + +```rust +trait UserExt { + fn display_name(&self) -> String; +} + +impl UserExt for User { + fn display_name(&self) -> String { + format!("{} <{}>", self.name, self.email.as_deref().unwrap_or("")) + } +} +``` + +**C++:** Use inheritance or free functions: + +```cpp +std::string display_name(const demo::User& user) { + return user.name + " <" + user.email.value_or("") + ">"; +} +``` diff --git a/docs/compiler/index.md b/docs/compiler/index.md new file mode 100644 index 000000000..be38c9b64 --- /dev/null +++ b/docs/compiler/index.md @@ -0,0 +1,206 @@ +--- +title: FDL Schema Guide +sidebar_position: 1 +id: schema_index +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +Fory Definition Language (FDL) is a schema definition language for Apache Fory that enables type-safe cross-language serialization. Define your data structures once and generate native data structure code for Java, Python, Go, Rust, and C++. + +## Overview + +FDL provides a simple, intuitive syntax for defining cross-language data structures: + +```proto +package example; + +enum Status [id=100] { + PENDING = 0; + ACTIVE = 1; + COMPLETED = 2; +} + +message User [id=101] { + string name = 1; + int32 age = 2; + optional string email = 3; + repeated string tags = 4; +} + +message Order [id=102] { + ref User customer = 1; + repeated Item items = 2; + Status status = 3; + map<string, int32> metadata = 4; +} +``` + +## Why FDL? + +### Schema-First Development + +Define your data model once in FDL and generate consistent, type-safe code across all languages. This ensures: + +- **Type Safety**: Catch type errors at compile time, not runtime +- **Consistency**: All languages use the same field names, types, and structures +- **Documentation**: Schema serves as living documentation +- **Evolution**: Managed schema changes across all implementations + +### Fory-Native Features + +Unlike generic IDLs, FDL is designed specifically for Fory serialization: + +- **Reference Tracking**: First-class support for shared and circular references via `ref` +- **Nullable Fields**: Explicit `optional` modifier for nullable types +- **Type Registration**: Built-in support for both numeric IDs and namespace-based registration +- **Native Code Generation**: Generates idiomatic code with Fory annotations/macros + +### Zero Runtime Overhead + +Generated code uses native language constructs: + +- Java: Plain POJOs with `@ForyField` annotations +- Python: Dataclasses with type hints +- Go: Structs with struct tags +- Rust: Structs with `#[derive(ForyObject)]` +- C++: Structs with `FORY_STRUCT` macros + +## Quick Start + +### 1. Install the Compiler + +```bash +cd compiler +pip install -e . +``` + +### 2. Write Your Schema + +Create `example.fdl`: + +```proto +package example; + +message Person [id=100] { + string name = 1; + int32 age = 2; + optional string email = 3; +} +``` + +### 3. Generate Code + +```bash +# Generate for all languages +fory compile example.fdl --output ./generated + +# Generate for specific languages +fory compile example.fdl --lang java,python --output ./generated +``` + +### 4. Use Generated Code + +**Java:** + +```java +Fory fory = Fory.builder().withLanguage(Language.XLANG).build(); +ExampleForyRegistration.register(fory); + +Person person = new Person(); +person.setName("Alice"); +person.setAge(30); +byte[] data = fory.serialize(person); +``` + +**Python:** + +```python +import pyfory +from example import Person, register_example_types + +fory = pyfory.Fory() +register_example_types(fory) + +person = Person(name="Alice", age=30) +data = fory.serialize(person) +``` + +## Documentation + +| Document | Description | +| ------------------------------------------ | -------------------------------------------- | +| [FDL Syntax Reference](fdl-syntax.md) | Complete language syntax and grammar | +| [Type System](type-system.md) | Primitive types, collections, and type rules | +| [Compiler Guide](compiler-guide.md) | CLI options and build integration | +| [Generated Code](generated-code.md) | Output format for each target language | +| [Protocol Buffers vs FDL](proto-vs-fdl.md) | Comparison with protobuf and migration guide | + +## Key Concepts + +### Type Registration + +FDL supports two registration modes: + +**Numeric Type IDs** - Fast and compact: + +```proto +message User [id=100] { ... } // Registered with ID 100 +``` + +**Namespace-based** - Flexible and readable: + +```proto +message Config { ... } // Registered as "package.Config" +``` + +### Field Modifiers + +- **`optional`**: Field can be null/None +- **`ref`**: Enable reference tracking for shared/circular references +- **`repeated`**: Field is a list/array + +```proto +message Example { + optional string nullable = 1; + ref Node parent = 2; + repeated int32 numbers = 3; +} +``` + +### Cross-Language Compatibility + +FDL types map to native types in each language: + +| FDL Type | Java | Python | Go | Rust | C++ | +| -------- | --------- | ------ | -------- | -------- | ------------- | +| `int32` | `int` | `int` | `int32` | `i32` | `int32_t` | +| `string` | `String` | `str` | `string` | `String` | `std::string` | +| `bool` | `boolean` | `bool` | `bool` | `bool` | `bool` | + +See [Type System](type-system.md) for complete mappings. + +## Best Practices + +1. **Use meaningful package names**: Group related types together +2. **Assign type IDs for performance**: Numeric IDs are faster than name-based registration +3. **Reserve ID ranges**: Leave gaps for future additions (e.g., 100-199 for users, 200-299 for orders) +4. **Use `optional` explicitly**: Make nullability clear in the schema +5. **Use `ref` for shared objects**: Enable reference tracking when objects are shared + +## Examples + +See the [examples](https://github.com/apache/fory/tree/main/compiler/examples) directory for complete working examples. diff --git a/docs/compiler/proto-vs-fdl.md b/docs/compiler/proto-vs-fdl.md new file mode 100644 index 000000000..5b87062dd --- /dev/null +++ b/docs/compiler/proto-vs-fdl.md @@ -0,0 +1,507 @@ +--- +title: Protocol Buffers vs FDL +sidebar_position: 6 +id: proto_vs_fdl +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +This document compares Google's Protocol Buffers (protobuf) with Fory Definition Language (FDL), helping you understand when to use each and how to migrate between them. + +## Overview + +| Aspect | Protocol Buffers | FDL | +| ---------------------- | --------------------------------- | ----------------------------------- | +| **Primary Purpose** | RPC and message interchange | Cross-language object serialization | +| **Design Philosophy** | Schema evolution, backward compat | Performance, native integration | +| **Reference Tracking** | Not supported | First-class support (`ref`) | +| **Generated Code** | Custom message types | Native language constructs | +| **Serialization** | Tag-length-value encoding | Fory binary protocol | +| **Performance** | Good | Excellent (up to 170x faster) | + +## Syntax Comparison + +### Package Declaration + +**Protocol Buffers:** + +```protobuf +syntax = "proto3"; +package example.models; +option java_package = "com.example.models"; +option go_package = "example.com/models"; +``` + +**FDL:** + +```proto +package example.models; +``` + +FDL uses a single package declaration that maps to all languages automatically. + +### Enum Definition + +**Protocol Buffers:** + +```protobuf +enum Status { + STATUS_UNSPECIFIED = 0; + STATUS_PENDING = 1; + STATUS_ACTIVE = 2; + STATUS_COMPLETED = 3; +} +``` + +**FDL:** + +```proto +enum Status [id=100] { + PENDING = 0; + ACTIVE = 1; + COMPLETED = 2; +} +``` + +Key differences: + +- FDL supports optional type IDs (`[id=100]`) for efficient serialization +- Protobuf requires `_UNSPECIFIED = 0` by convention; FDL uses explicit values +- FDL enum values don't require prefixes + +### Message Definition + +**Protocol Buffers:** + +```protobuf +message User { + string id = 1; + string name = 2; + optional string email = 3; + int32 age = 4; + repeated string tags = 5; + map<string, string> metadata = 6; +} +``` + +**FDL:** + +```proto +message User [id=101] { + string id = 1; + string name = 2; + optional string email = 3; + int32 age = 4; + repeated string tags = 5; + map<string, string> metadata = 6; +} +``` + +Syntax is nearly identical, but FDL adds: + +- Type IDs (`[id=101]`) for cross-language registration +- `ref` modifier for reference tracking + +### Nested Types + +**Protocol Buffers:** + +```protobuf +message Order { + message Item { + string product_id = 1; + int32 quantity = 2; + } + repeated Item items = 1; +} +``` + +**FDL:** + +```proto +message OrderItem [id=200] { + string product_id = 1; + int32 quantity = 2; +} + +message Order [id=201] { + repeated OrderItem items = 1; +} +``` + +FDL supports nested types, but generators may flatten them for languages where nested types are not idiomatic. + +### Imports + +**Protocol Buffers:** + +```protobuf +import "other.proto"; +import "google/protobuf/timestamp.proto"; +``` + +**FDL:** + +FDL currently requires all types in a single file or uses forward references within the same file. + +## Feature Comparison + +### Reference Tracking + +FDL's killer feature is first-class reference tracking: + +**FDL:** + +```proto +message TreeNode [id=300] { + string value = 1; + ref TreeNode parent = 2; + repeated ref TreeNode children = 3; // Element refs + ref repeated TreeNode path = 4; // Collection ref +} + +message Graph [id=301] { + repeated ref Node nodes = 1; // Shared references preserved (elements) +} +``` + +**Protocol Buffers:** + +Protobuf cannot represent circular or shared references. You must use workarounds: + +```protobuf +// Workaround: Use IDs instead of references +message TreeNode { + string id = 1; + string value = 2; + string parent_id = 3; // Manual ID reference + repeated string child_ids = 4; +} +``` + +### Type System + +| Type | Protocol Buffers | FDL | +| ---------- | ------------------------------------------------------------------------------------------------------ | --------------------------------- | +| Boolean | `bool` | `bool` | +| Integers | `int32`, `int64`, `sint32`, `sint64`, `uint32`, `uint64`, `fixed32`, `fixed64`, `sfixed32`, `sfixed64` | `int8`, `int16`, `int32`, `int64` | +| Floats | `float`, `double` | `float32`, `float64` | +| String | `string` | `string` | +| Binary | `bytes` | `bytes` | +| Timestamp | `google.protobuf.Timestamp` | `timestamp` | +| Date | Not built-in | `date` | +| Duration | `google.protobuf.Duration` | Not built-in | +| List | `repeated T` | `repeated T` | +| Map | `map<K, V>` | `map<K, V>` | +| Nullable | `optional T` (proto3) | `optional T` | +| Oneof | `oneof` | Not supported | +| Any | `google.protobuf.Any` | Not supported | +| Extensions | `extend` | Not supported | + +### Wire Format + +**Protocol Buffers:** + +- Tag-length-value encoding +- Variable-length integers (varints) +- Field numbers encoded in wire format +- Unknown fields preserved + +**FDL/Fory:** + +- Optimized binary format +- Schema-aware encoding +- Type IDs for fast lookup +- Reference tracking support +- Zero-copy deserialization where possible + +### Generated Code Style + +**Protocol Buffers** generates custom types with builders and accessors: + +```java +// Protobuf generated Java +User user = User.newBuilder() + .setId("u123") + .setName("Alice") + .setAge(30) + .build(); +``` + +**FDL** generates native POJOs: + +```java +// FDL generated Java +User user = new User(); +user.setId("u123"); +user.setName("Alice"); +user.setAge(30); +``` + +### Comparison Table + +| Feature | Protocol Buffers | FDL | +| -------------------------- | ----------------- | --------- | +| Schema evolution | Excellent | Good | +| Backward compatibility | Excellent | Good | +| Reference tracking | No | Yes | +| Circular references | No | Yes | +| Native code generation | No (custom types) | Yes | +| Unknown field preservation | Yes | No | +| Schema-less mode | No | Yes\* | +| RPC integration (gRPC) | Yes | No | +| Zero-copy deserialization | Limited | Yes | +| Human-readable format | JSON, TextFormat | No | +| Performance | Good | Excellent | + +\*Fory supports schema-less serialization without FDL + +## When to Use Each + +### Use Protocol Buffers When: + +1. **Building gRPC services**: Protobuf is the native format for gRPC +2. **Maximum backward compatibility**: Protobuf's unknown field handling is robust +3. **Schema evolution is critical**: Adding/removing fields across versions +4. **You need oneof/Any types**: Complex polymorphism requirements +5. **Human-readable debugging**: TextFormat and JSON transcoding available +6. **Ecosystem integration**: Wide tooling support (linting, documentation) + +### Use FDL/Fory When: + +1. **Performance is critical**: Up to 170x faster than protobuf +2. **Cross-language object graphs**: Serialize Java objects, deserialize in Python +3. **Circular/shared references**: Object graphs with cycles +4. **Native code preferred**: Standard POJOs, dataclasses, structs +5. **Memory efficiency**: Zero-copy deserialization +6. **Existing object models**: Minimal changes to existing code + +## Performance Comparison + +Benchmarks show Fory significantly outperforms Protocol Buffers: + +| Benchmark | Protocol Buffers | Fory | Improvement | +| ------------------------- | ---------------- | -------- | ----------- | +| Serialization (simple) | 1x | 10-20x | 10-20x | +| Deserialization (simple) | 1x | 10-20x | 10-20x | +| Serialization (complex) | 1x | 50-100x | 50-100x | +| Deserialization (complex) | 1x | 50-100x | 50-100x | +| Memory allocation | 1x | 0.1-0.5x | 2-10x less | + +_Benchmarks vary based on data structure and language. See [Fory benchmarks](../benchmarks/) for details._ + +## Migration Guide + +### From Protocol Buffers to FDL + +#### Step 1: Convert Syntax + +**Before (proto):** + +```protobuf +syntax = "proto3"; +package myapp; + +message Person { + string name = 1; + int32 age = 2; + repeated string emails = 3; + Address address = 4; +} + +message Address { + string street = 1; + string city = 2; +} +``` + +**After (FDL):** + +```proto +package myapp; + +message Address [id=100] { + string street = 1; + string city = 2; +} + +message Person [id=101] { + string name = 1; + int32 age = 2; + repeated string emails = 3; + Address address = 4; +} +``` + +#### Step 2: Handle Special Cases + +**oneof fields:** + +```protobuf +// Proto +message Result { + oneof result { + Success success = 1; + Error error = 2; + } +} +``` + +```proto +// FDL - Use separate optional fields +message Result [id=102] { + optional Success success = 1; + optional Error error = 2; +} +// Or model as sealed class hierarchy in generated code +``` + +**Well-known types:** + +```protobuf +// Proto +import "google/protobuf/timestamp.proto"; +message Event { + google.protobuf.Timestamp created_at = 1; +} +``` + +```proto +// FDL +message Event [id=103] { + timestamp created_at = 1; +} +``` + +#### Step 3: Add Type IDs + +Assign unique type IDs for cross-language compatibility: + +```proto +// Reserve ranges for different domains +// 100-199: Common types +// 200-299: User domain +// 300-399: Order domain + +message Address [id=100] { ... } +message Person [id=200] { ... } +message Order [id=300] { ... } +``` + +#### Step 4: Update Build Configuration + +**Before (Maven with protobuf):** + +```xml +<plugin> + <groupId>org.xolstice.maven.plugins</groupId> + <artifactId>protobuf-maven-plugin</artifactId> + <!-- ... --> +</plugin> +``` + +**After (Maven with FDL):** + +```xml +<plugin> + <groupId>org.codehaus.mojo</groupId> + <artifactId>exec-maven-plugin</artifactId> + <executions> + <execution> + <id>generate-fory-types</id> + <phase>generate-sources</phase> + <goals><goal>exec</goal></goals> + <configuration> + <executable>fory</executable> + <arguments> + <argument>compile</argument> + <argument>${project.basedir}/src/main/fdl/schema.fdl</argument> + <argument>--lang</argument> + <argument>java</argument> + <argument>--output</argument> + <argument>${project.build.directory}/generated-sources/fdl</argument> + </arguments> + </configuration> + </execution> + </executions> +</plugin> +``` + +#### Step 5: Update Application Code + +**Before (Protobuf Java):** + +```java +// Protobuf style +Person.Builder builder = Person.newBuilder(); +builder.setName("Alice"); +builder.setAge(30); +Person person = builder.build(); + +byte[] data = person.toByteArray(); +Person restored = Person.parseFrom(data); +``` + +**After (Fory Java):** + +```java +// Fory style +Person person = new Person(); +person.setName("Alice"); +person.setAge(30); + +Fory fory = Fory.builder().withLanguage(Language.XLANG).build(); +MyappForyRegistration.register(fory); + +byte[] data = fory.serialize(person); +Person restored = (Person) fory.deserialize(data); +``` + +### Coexistence Strategy + +For gradual migration, you can run both systems in parallel: + +```java +// Dual serialization during migration +public byte[] serialize(Object obj, Format format) { + if (format == Format.PROTOBUF) { + return ((MessageLite) obj).toByteArray(); + } else { + return fory.serialize(obj); + } +} + +// Convert between formats +public ForyPerson fromProto(ProtoPerson proto) { + ForyPerson person = new ForyPerson(); + person.setName(proto.getName()); + person.setAge(proto.getAge()); + return person; +} +``` + +## Summary + +| Aspect | Choose Protocol Buffers | Choose FDL/Fory | +| ---------------- | ----------------------- | ---------------------- | +| Use case | RPC, API contracts | Object serialization | +| Performance | Acceptable | Critical | +| References | Not needed | Circular/shared needed | +| Code style | Builder pattern OK | Native POJOs preferred | +| Schema evolution | Complex requirements | Simpler requirements | +| Ecosystem | Need gRPC, tooling | Need raw performance | + +Both tools excel in their domains. Protocol Buffers shines for RPC and API contracts with strong schema evolution guarantees. FDL/Fory excels at high-performance object serialization with native language integration and reference tracking support. diff --git a/docs/compiler/type-system.md b/docs/compiler/type-system.md new file mode 100644 index 000000000..b47a8f05f --- /dev/null +++ b/docs/compiler/type-system.md @@ -0,0 +1,449 @@ +--- +title: FDL Type System +sidebar_position: 3 +id: fdl_type_system +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +This document describes the FDL type system and how types map to each target language. + +## Overview + +FDL provides a rich type system designed for cross-language compatibility: + +- **Primitive Types**: Basic scalar types (integers, floats, strings, etc.) +- **Enum Types**: Named integer constants +- **Message Types**: Structured compound types +- **Collection Types**: Lists and maps +- **Nullable Types**: Optional/nullable variants + +## Primitive Types + +### Boolean + +```proto +bool is_active = 1; +``` + +| Language | Type | Notes | +| -------- | --------------------- | ------------------ | +| Java | `boolean` / `Boolean` | Primitive or boxed | +| Python | `bool` | | +| Go | `bool` | | +| Rust | `bool` | | +| C++ | `bool` | | + +### Integer Types + +FDL provides fixed-width signed integers: + +| FDL Type | Size | Range | +| -------- | ------ | ----------------- | +| `int8` | 8-bit | -128 to 127 | +| `int16` | 16-bit | -32,768 to 32,767 | +| `int32` | 32-bit | -2^31 to 2^31 - 1 | +| `int64` | 64-bit | -2^63 to 2^63 - 1 | + +**Language Mapping:** + +| FDL | Java | Python | Go | Rust | C++ | +| ------- | ------- | -------------- | ------- | ----- | --------- | +| `int8` | `byte` | `pyfory.int8` | `int8` | `i8` | `int8_t` | +| `int16` | `short` | `pyfory.int16` | `int16` | `i16` | `int16_t` | +| `int32` | `int` | `pyfory.int32` | `int32` | `i32` | `int32_t` | +| `int64` | `long` | `pyfory.int64` | `int64` | `i64` | `int64_t` | + +**Examples:** + +```proto +message Counters { + int8 tiny = 1; + int16 small = 2; + int32 medium = 3; + int64 large = 4; +} +``` + +**Python Type Hints:** + +Python's native `int` is arbitrary precision, so FDL uses type wrappers for fixed-width integers: + +```python +from pyfory import int8, int16, int32 + +@dataclass +class Counters: + tiny: int8 + small: int16 + medium: int32 + large: int # int64 maps to native int +``` + +### Floating-Point Types + +| FDL Type | Size | Precision | +| --------- | ------ | ------------- | +| `float32` | 32-bit | ~7 digits | +| `float64` | 64-bit | ~15-16 digits | + +**Language Mapping:** + +| FDL | Java | Python | Go | Rust | C++ | +| --------- | -------- | ---------------- | --------- | ----- | -------- | +| `float32` | `float` | `pyfory.float32` | `float32` | `f32` | `float` | +| `float64` | `double` | `pyfory.float64` | `float64` | `f64` | `double` | + +**Example:** + +```proto +message Coordinates { + float64 latitude = 1; + float64 longitude = 2; + float32 altitude = 3; +} +``` + +### String Type + +UTF-8 encoded text: + +```proto +string name = 1; +``` + +| Language | Type | Notes | +| -------- | ------------- | --------------------- | +| Java | `String` | Immutable | +| Python | `str` | | +| Go | `string` | Immutable | +| Rust | `String` | Owned, heap-allocated | +| C++ | `std::string` | | + +### Bytes Type + +Raw binary data: + +```proto +bytes data = 1; +``` + +| Language | Type | Notes | +| -------- | ---------------------- | --------- | +| Java | `byte[]` | | +| Python | `bytes` | Immutable | +| Go | `[]byte` | | +| Rust | `Vec<u8>` | | +| C++ | `std::vector<uint8_t>` | | + +### Temporal Types + +#### Date + +Calendar date without time: + +```proto +date birth_date = 1; +``` + +| Language | Type | Notes | +| -------- | -------------------------------- | ----------------------- | +| Java | `java.time.LocalDate` | | +| Python | `datetime.date` | | +| Go | `time.Time` | Time portion ignored | +| Rust | `chrono::NaiveDate` | Requires `chrono` crate | +| C++ | `fory::serialization::LocalDate` | | + +#### Timestamp + +Date and time with nanosecond precision: + +```proto +timestamp created_at = 1; +``` + +| Language | Type | Notes | +| -------- | -------------------------------- | ----------------------- | +| Java | `java.time.Instant` | UTC-based | +| Python | `datetime.datetime` | | +| Go | `time.Time` | | +| Rust | `chrono::NaiveDateTime` | Requires `chrono` crate | +| C++ | `fory::serialization::Timestamp` | | + +## Enum Types + +Enums define named integer constants: + +```proto +enum Priority [id=100] { + LOW = 0; + MEDIUM = 1; + HIGH = 2; + CRITICAL = 3; +} +``` + +**Language Mapping:** + +| Language | Implementation | +| -------- | --------------------------------------- | +| Java | `enum Priority { LOW, MEDIUM, ... }` | +| Python | `class Priority(IntEnum): LOW = 0, ...` | +| Go | `type Priority int32` with constants | +| Rust | `#[repr(i32)] enum Priority { ... }` | +| C++ | `enum class Priority : int32_t { ... }` | + +**Java:** + +```java +public enum Priority { + LOW, + MEDIUM, + HIGH, + CRITICAL; +} +``` + +**Python:** + +```python +class Priority(IntEnum): + LOW = 0 + MEDIUM = 1 + HIGH = 2 + CRITICAL = 3 +``` + +**Go:** + +```go +type Priority int32 + +const ( + PriorityLow Priority = 0 + PriorityMedium Priority = 1 + PriorityHigh Priority = 2 + PriorityCritical Priority = 3 +) +``` + +**Rust:** + +```rust +#[derive(ForyObject, Debug, Clone, PartialEq, Default)] +#[repr(i32)] +pub enum Priority { + #[default] + Low = 0, + Medium = 1, + High = 2, + Critical = 3, +} +``` + +**C++:** + +```cpp +enum class Priority : int32_t { + LOW = 0, + MEDIUM = 1, + HIGH = 2, + CRITICAL = 3, +}; +FORY_ENUM(Priority, LOW, MEDIUM, HIGH, CRITICAL); +``` + +## Message Types + +Messages are structured types composed of fields: + +```proto +message User [id=101] { + string id = 1; + string name = 2; + int32 age = 3; +} +``` + +**Language Mapping:** + +| Language | Implementation | +| -------- | ----------------------------------- | +| Java | POJO class with getters/setters | +| Python | `@dataclass` class | +| Go | Struct with exported fields | +| Rust | Struct with `#[derive(ForyObject)]` | +| C++ | Struct with `FORY_STRUCT` macro | + +## Collection Types + +### List (repeated) + +The `repeated` modifier creates a list: + +```proto +repeated string tags = 1; +repeated User users = 2; +``` + +**Language Mapping:** + +| FDL | Java | Python | Go | Rust | C++ | +| ----------------- | --------------- | ------------ | ---------- | ------------- | -------------------------- | +| `repeated string` | `List<String>` | `List[str]` | `[]string` | `Vec<String>` | `std::vector<std::string>` | +| `repeated int32` | `List<Integer>` | `List[int]` | `[]int32` | `Vec<i32>` | `std::vector<int32_t>` | +| `repeated User` | `List<User>` | `List[User]` | `[]User` | `Vec<User>` | `std::vector<User>` | + +**List modifiers:** + +| FDL | Java | Python | Go | Rust | C++ | +| -------------------------- | ---------------------------------------------- | --------------------------------------- | ----------------------- | --------------------- | ----------------------------------------- | +| `optional repeated string` | `List<String>` + `@ForyField(nullable = true)` | `Optional[List[str]]` | `[]string` + `nullable` | `Option<Vec<String>>` | `std::optional<std::vector<std::string>>` | +| `repeated optional string` | `List<String>` (nullable elements) | `List[Optional[str]]` | `[]*string` | `Vec<Option<String>>` | `std::vector<std::optional<std::string>>` | +| `ref repeated User` | `List<User>` + `@ForyField(ref = true)` | `List[User]` + `pyfory.field(ref=True)` | `[]User` + `ref` | `Arc<Vec<User>>`\* | `std::shared_ptr<std::vector<User>>` | +| `repeated ref User` | `List<User>` | `List[User]` | `[]*User` + `ref=false` | `Vec<Arc<User>>`\* | `std::vector<std::shared_ptr<User>>` | + +\*Use `[(fory).thread_safe_pointer = false]` to generate `Rc` instead of `Arc` in Rust. + +### Map + +Maps with typed keys and values: + +```proto +map<string, int32> counts = 1; +map<string, User> users = 2; +``` + +**Language Mapping:** + +| FDL | Java | Python | Go | Rust | C++ | +| -------------------- | ---------------------- | ----------------- | ------------------ | ----------------------- | -------------------------------- | +| `map<string, int32>` | `Map<String, Integer>` | `Dict[str, int]` | `map[string]int32` | `HashMap<String, i32>` | `std::map<std::string, int32_t>` | +| `map<string, User>` | `Map<String, User>` | `Dict[str, User]` | `map[string]User` | `HashMap<String, User>` | `std::map<std::string, User>` | + +**Key Type Restrictions:** + +Map keys should be hashable types: + +- `string` (most common) +- Integer types (`int8`, `int16`, `int32`, `int64`) +- `bool` + +Avoid using messages or complex types as keys. + +## Nullable Types + +The `optional` modifier makes a field nullable: + +```proto +message Profile { + string name = 1; // Required + optional string bio = 2; // Nullable + optional int32 age = 3; // Nullable integer +} +``` + +**Language Mapping:** + +| FDL | Java | Python | Go | Rust | C++ | +| ----------------- | ---------- | --------------- | --------- | ---------------- | ---------------------------- | +| `optional string` | `String`\* | `Optional[str]` | `*string` | `Option<String>` | `std::optional<std::string>` | +| `optional int32` | `Integer` | `Optional[int]` | `*int32` | `Option<i32>` | `std::optional<int32_t>` | + +\*Java uses boxed types with `@ForyField(nullable = true)` annotation. + +**Default Values:** + +| Type | Default Value | +| ------------------ | ------------------- | +| Non-optional types | Language default | +| Optional types | `null`/`None`/`nil` | + +## Reference Types + +The `ref` modifier enables reference tracking: + +```proto +message TreeNode { + string value = 1; + ref TreeNode parent = 2; + repeated ref TreeNode children = 3; +} +``` + +**Use Cases:** + +1. **Shared References**: Same object referenced from multiple places +2. **Circular References**: Object graphs with cycles +3. **Large Objects**: Avoid duplicate serialization + +**Language Mapping:** + +| FDL | Java | Python | Go | Rust | C++ | +| ---------- | -------- | ------ | ---------------------- | ----------- | ----------------------- | +| `ref User` | `User`\* | `User` | `*User` + `fory:"ref"` | `Arc<User>` | `std::shared_ptr<User>` | + +\*Java uses `@ForyField(ref = true)` annotation. + +Rust uses `Arc` by default; set `[(fory).thread_safe_pointer = false]` to use `Rc`. + +## Type Compatibility Matrix + +This matrix shows which type conversions are safe across languages: + +| From → To | bool | int8 | int16 | int32 | int64 | float32 | float64 | string | +| ----------- | ---- | ---- | ----- | ----- | ----- | ------- | ------- | ------ | +| **bool** | ✓ | ✓ | ✓ | ✓ | ✓ | - | - | - | +| **int8** | - | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | - | +| **int16** | - | - | ✓ | ✓ | ✓ | ✓ | ✓ | - | +| **int32** | - | - | - | ✓ | ✓ | - | ✓ | - | +| **int64** | - | - | - | - | ✓ | - | - | - | +| **float32** | - | - | - | - | - | ✓ | ✓ | - | +| **float64** | - | - | - | - | - | - | ✓ | - | +| **string** | - | - | - | - | - | - | - | ✓ | + +✓ = Safe conversion, - = Not recommended + +## Best Practices + +### Choosing Integer Types + +- Use `int32` as the default for most integers +- Use `int64` for large values (timestamps, IDs) +- Use `int8`/`int16` only when storage size matters + +### String vs Bytes + +- Use `string` for text data (UTF-8) +- Use `bytes` for binary data (images, files, encrypted data) + +### Optional vs Required + +- Use `optional` when the field may legitimately be absent +- Default to required fields for better type safety +- Document why a field is optional + +### Reference Tracking + +- Use `ref` only when needed (shared/circular references) +- Reference tracking adds overhead +- Test with realistic data to ensure correctness + +### Collections + +- Prefer `repeated` for ordered sequences +- Use `map` for key-value lookups +- Consider message types for complex map values --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
