This is an automated email from the ASF dual-hosted git repository. chaokunyang pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/fory-site.git
commit 2939960a11d376f2a0a56b9b6b3843e0269eac65 Author: chaokunyang <[email protected]> AuthorDate: Wed Jun 4 02:40:02 2025 +0000 π created local 'docs/guide/' from remote 'docs/guide/' --- docs/guide/DEVELOPMENT.md | 122 +++++++ docs/guide/graalvm_guide.md | 256 +++++++++++++ docs/guide/java_serialization_guide.md | 628 ++++++++++++++++++++++++++++++++ docs/guide/row_format_guide.md | 154 ++++++++ docs/guide/scala_guide.md | 170 +++++++++ docs/guide/xlang_serialization_guide.md | 612 +++++++++++++++++++++++++++++++ docs/guide/xlang_type_mapping.md | 116 ++++++ 7 files changed, 2058 insertions(+) diff --git a/docs/guide/DEVELOPMENT.md b/docs/guide/DEVELOPMENT.md new file mode 100644 index 00000000..dffe0995 --- /dev/null +++ b/docs/guide/DEVELOPMENT.md @@ -0,0 +1,122 @@ +--- +title: Development +sidebar_position: 7 +id: development +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +## How to build Fory + +Please checkout the source tree from https://github.com/apache/fory. + +### Build Fory Java + +```bash +cd java +mvn clean compile -DskipTests +``` + +#### Environment Requirements + +- java 1.8+ +- maven 3.6.3+ + +### Build Fory Python + +```bash +cd python +# Uninstall numpy first so that when we install pyarrow, it will install the correct numpy version automatically. +# For Python versions less than 3.13, numpy 2 is not currently supported. +pip uninstall -y numpy +# Install necessary environment for Python < 3.13. +pip install pyarrow==15.0.0 Cython wheel pytest +# For Python 3.13, pyarrow 18.0.0 is available and requires numpy version greater than 2. +# pip install pyarrow==18.0.0 Cython wheel pytest +pip install -v -e . +``` + +#### Environment Requirements + +- python 3.6+ + +### Build Fory C++ + +Build fory row formatοΌ + +```bash +pip install pyarrow==15.0.0 +bazel build //cpp/fory/row:fory_row_format +``` + +Build fory row format encoder: + +```bash +pip install pyarrow==15.0.0 +bazel build //cpp/fory/encoder:fory_encoder +``` + +#### Environment Requirements + +- compilers with C++17 support +- bazel 6.3.2 + +### Build Fory GoLang + +```bash +cd go/fory +# run test +go test -v +# run xlang test +go test -v fory_xlang_test.go +``` + +#### Environment Requirements + +- go 1.13+ + +### Build Fory Rust + +```bash +cd rust +# build +cargo build +# run test +cargo test +``` + +#### Environment Requirements + +```bash +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh +``` + +### Build Fory JavaScript + +```bash +cd javascript +npm install + +# run build +npm run build +# run test +npm run test +``` + +#### Environment Requirements + +- node 14+ +- npm 8+ diff --git a/docs/guide/graalvm_guide.md b/docs/guide/graalvm_guide.md new file mode 100644 index 00000000..59fead71 --- /dev/null +++ b/docs/guide/graalvm_guide.md @@ -0,0 +1,256 @@ +--- +title: GraalVM Guide +sidebar_position: 6 +id: graalvm_guide +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +## GraalVM Native Image + +GraalVM `native image` can compile java code into native code ahead to build faster, smaller, leaner applications. +The native image doesn't have a JIT compiler to compile bytecode into machine code, and doesn't support +reflection unless configure reflection file. + +Fory runs on GraalVM native image pretty well. Fory generates all serializer code for `Fory JIT framework` and `MethodHandle/LambdaMetafactory` at graalvm build time. Then use those generated code for serialization at runtime without +any extra cost, the performance is great. + +In order to use Fory on graalvm native image, you must create Fory as an **static** field of a class, and **register** all classes at + the enclosing class initialize time. Then configure `native-image.properties` under +`resources/META-INF/native-image/$xxx/native-image.propertie` to tell graalvm to init the class at native image +build time. For example, here we configure `org.apache.fory.graalvm.Example` class be init at build time: + +```properties +Args = --initialize-at-build-time=org.apache.fory.graalvm.Example +``` + +Another benefit using fory is that you don't have to configure [reflection json](https://www.graalvm.org/latest/reference-manual/native-image/metadata/#specifying-reflection-metadata-in-json) and +[serialization json](https://www.graalvm.org/latest/reference-manual/native-image/metadata/#serialization), which is +very tedious, cumbersome and inconvenient. When using fory, you just need to invoke +`org.apache.fory.Fory.register(Class<?>, boolean)` for every type you want to serialize. + +Note that Fory `asyncCompilationEnabled` option will be disabled automatically for graalvm native image since graalvm +native image doesn't support JIT at the image run time. + +## Not thread-safe Fory + +Example: + +```java +import org.apache.fory.Fory; +import org.apache.fory.util.Preconditions; + +import java.util.List; +import java.util.Map; + +public class Example { + public record Record ( + int f1, + String f2, + List<String> f3, + Map<String, Long> f4) { + } + + static Fory fory; + + static { + fory = Fory.builder().build(); + // register and generate serializer code. + fory.register(Record.class, true); + } + + public static void main(String[] args) { + Record record = new Record(10, "abc", List.of("str1", "str2"), Map.of("k1", 10L, "k2", 20L)); + System.out.println(record); + byte[] bytes = fory.serialize(record); + Object o = fory.deserialize(bytes); + System.out.println(o); + Preconditions.checkArgument(record.equals(o)); + } +} +``` + +Then add `org.apache.fory.graalvm.Example` build time init to `native-image.properties` configuration: + +```properties +Args = --initialize-at-build-time=org.apache.fory.graalvm.Example +``` + +## Thread-safe Fory + +```java +import org.apache.fory.Fory; +import org.apache.fory.ThreadLocalFory; +import org.apache.fory.ThreadSafeFory; +import org.apache.fory.util.Preconditions; + +import java.util.List; +import java.util.Map; + +public class ThreadSafeExample { + public record Foo ( + int f1, + String f2, + List<String> f3, + Map<String, Long> f4) { + } + + static ThreadSafeFory fory; + + static { + fory = new ThreadLocalFory(classLoader -> { + Fory f = Fory.builder().build(); + // register and generate serializer code. + f.register(Foo.class, true); + return f; + }); + } + + public static void main(String[] args) { + System.out.println(fory.deserialize(fory.serialize("abc"))); + System.out.println(fory.deserialize(fory.serialize(List.of(1,2,3)))); + System.out.println(fory.deserialize(fory.serialize(Map.of("k1", 1, "k2", 2)))); + Foo foo = new Foo(10, "abc", List.of("str1", "str2"), Map.of("k1", 10L, "k2", 20L)); + System.out.println(foo); + byte[] bytes = fory.serialize(foo); + Object o = fory.deserialize(bytes); + System.out.println(o); + } +} +``` + +Then add `org.apache.fory.graalvm.ThreadSafeExample` build time init to `native-image.properties` configuration: + +```properties +Args = --initialize-at-build-time=org.apache.fory.graalvm.ThreadSafeExample +``` + +## Framework Integration + +For framework developers, if you want to integrate fory for serialization, you can provided a configuration file to let +the users to list all the classes they want to serialize, then you can load those classes and invoke +`org.apache.fory.Fory.register(Class<?>, boolean)` to register those classes in your Fory integration class, and configure that +class be initialized at graalvm native image build time. + +## Benchmark + +Here we give two class benchmarks between Fory and Graalvm Serialization. + +When Fory compression is disabled: + +- Struct: Fory is `46x speed, 43% size` compared to JDK. +- Pojo: Fory is `12x speed, 56% size` compared to JDK. + +When Fory compression is enabled: + +- Struct: Fory is `24x speed, 31% size` compared to JDK. +- Pojo: Fory is `12x speed, 48% size` compared to JDK. + +See [[Benchmark.java](https://github.com/apache/fory/blob/main/integration_tests/graalvm_tests/src/main/java/org/apache/fory/graalvm/Benchmark.java)] for benchmark code. + +### Struct Benchmark + +#### Class Fields + +```java +public class Struct implements Serializable { + public int f1; + public long f2; + public float f3; + public double f4; + public int f5; + public long f6; + public float f7; + public double f8; + public int f9; + public long f10; + public float f11; + public double f12; +} +``` + +#### Benchmark Results + +No compression: + +``` +Benchmark repeat number: 400000 +Object type: class org.apache.fory.graalvm.Struct +Compress number: false +Fory size: 76.0 +JDK size: 178.0 +Fory serialization took mills: 49 +JDK serialization took mills: 2254 +Compare speed: Fory is 45.70x speed of JDK +Compare size: Fory is 0.43x size of JDK +``` + +Compress number: + +``` +Benchmark repeat number: 400000 +Object type: class org.apache.fory.graalvm.Struct +Compress number: true +Fory size: 55.0 +JDK size: 178.0 +Fory serialization took mills: 130 +JDK serialization took mills: 3161 +Compare speed: Fory is 24.16x speed of JDK +Compare size: Fory is 0.31x size of JDK +``` + +### Pojo Benchmark + +#### Class Fields + +```java +public class Foo implements Serializable { + int f1; + String f2; + List<String> f3; + Map<String, Long> f4; +} +``` + +#### Benchmark Results + +No compression: + +``` +Benchmark repeat number: 400000 +Object type: class org.apache.fory.graalvm.Foo +Compress number: false +Fory size: 541.0 +JDK size: 964.0 +Fory serialization took mills: 1663 +JDK serialization took mills: 16266 +Compare speed: Fory is 12.19x speed of JDK +Compare size: Fory is 0.56x size of JDK +``` + +Compress number: + +``` +Benchmark repeat number: 400000 +Object type: class org.apache.fory.graalvm.Foo +Compress number: true +Fory size: 459.0 +JDK size: 964.0 +Fory serialization took mills: 1289 +JDK serialization took mills: 15069 +Compare speed: Fory is 12.11x speed of JDK +Compare size: Fory is 0.48x size of JDK +``` diff --git a/docs/guide/java_serialization_guide.md b/docs/guide/java_serialization_guide.md new file mode 100644 index 00000000..af85d1bb --- /dev/null +++ b/docs/guide/java_serialization_guide.md @@ -0,0 +1,628 @@ +--- +title: Java Serialization Guide +sidebar_position: 0 +id: java_object_graph_guide +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +## Java object graph serialization + +When only java object serialization needed, this mode will have better performance compared to cross-language object +graph serialization. + +## Quick Start + +Note that fory creation is not cheap, the **fory instances should be reused between serializations** instead of creating +it everytime. +You should keep fory to a static global variable, or instance variable of some singleton object or limited objects. + +Fory for single-thread usage: + +```java +import java.util.List; +import java.util.Arrays; + +import org.apache.fory.*; +import org.apache.fory.config.*; + +public class Example { + public static void main(String[] args) { + SomeClass object = new SomeClass(); + // Note that Fory instances should be reused between + // multiple serializations of different objects. + Fory fory = Fory.builder().withLanguage(Language.JAVA) + .requireClassRegistration(true) + .build(); + // Registering types can reduce class name serialization overhead, but not mandatory. + // If class registration enabled, all custom types must be registered. + fory.register(SomeClass.class); + byte[] bytes = fory.serialize(object); + System.out.println(fory.deserialize(bytes)); + } +} +``` + +Fory for multiple-thread usage: + +```java +import java.util.List; +import java.util.Arrays; + +import org.apache.fory.*; +import org.apache.fory.config.*; + +public class Example { + public static void main(String[] args) { + SomeClass object = new SomeClass(); + // Note that Fory instances should be reused between + // multiple serializations of different objects. + ThreadSafeFory fory = new ThreadLocalFory(classLoader -> { + Fory f = Fory.builder().withLanguage(Language.JAVA) + .withClassLoader(classLoader).build(); + f.register(SomeClass.class); + return f; + }); + byte[] bytes = fory.serialize(object); + System.out.println(fory.deserialize(bytes)); + } +} +``` + +Fory instances reuse example: + +```java +import java.util.List; +import java.util.Arrays; + +import org.apache.fory.*; +import org.apache.fory.config.*; + +public class Example { + // reuse fory. + private static final ThreadSafeFory fory = new ThreadLocalFory(classLoader -> { + Fory f = Fory.builder().withLanguage(Language.JAVA) + .withClassLoader(classLoader).build(); + f.register(SomeClass.class); + return f; + }); + + public static void main(String[] args) { + SomeClass object = new SomeClass(); + byte[] bytes = fory.serialize(object); + System.out.println(fory.deserialize(bytes)); + } +} +``` + +## ForyBuilder options + +| Option Name | Description [...] +|-------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [...] +| `timeRefIgnored` | Whether to ignore reference tracking of all time types registered in `TimeSerializers` and subclasses of those types when ref tracking is enabled. If ignored, ref tracking of every time type can be enabled by invoking `Fory#registerSerializer(Class, Serializer)`. For example, `fory.registerSerializer(Date.class, new DateSerializer(fory, true))`. Note that enabling ref tracking should happen before serializer codegen of any types which contain time [...] +| `compressInt` | Enables or disables int compression for smaller size. [...] +| `compressLong` | Enables or disables long compression for smaller size. [...] +| `compressString` | Enables or disables string compression for smaller size. [...] +| `classLoader` | The classloader should not be updated; Fory caches class metadata. Use `LoaderBinding` or `ThreadSafeFory` for classloader updates. [...] +| `compatibleMode` | Type forward/backward compatibility config. Also Related to `checkClassVersion` config. `SCHEMA_CONSISTENT`: Class schema must be consistent between serialization peer and deserialization peer. `COMPATIBLE`: Class schema can be different between serialization peer and deserialization peer. They can add/delete fields independently. [See more](#class-inconsistency-and-class-version-check). [...] +| `checkClassVersion` | Determines whether to check the consistency of the class schema. If enabled, Fory checks, writes, and checks consistency using the `classVersionHash`. It will be automatically disabled when `CompatibleMode#COMPATIBLE` is enabled. Disabling is not recommended unless you can ensure the class won't evolve. [...] +| `checkJdkClassSerializable` | Enables or disables checking of `Serializable` interface for classes under `java.*`. If a class under `java.*` is not `Serializable`, Fory will throw an `UnsupportedOperationException`. [...] +| `registerGuavaTypes` | Whether to pre-register Guava types such as `RegularImmutableMap`/`RegularImmutableList`. These types are not public API, but seem pretty stable. [...] +| `requireClassRegistration` | Disabling may allow unknown classes to be deserialized, potentially causing security risks. [...] +| `suppressClassRegistrationWarnings` | Whether to suppress class registration warnings. The warnings can be used for security audit, but may be annoying, this suppression will be enabled by default. [...] +| `metaShareEnabled` | Enables or disables meta share mode. [...] +| `scopedMetaShareEnabled` | Scoped meta share focuses on a single serialization process. Metadata created or identified during this process is exclusive to it and is not shared with by other serializations. [...] +| `metaCompressor` | Set a compressor for meta compression. Note that the passed MetaCompressor should be thread-safe. By default, a `Deflater` based compressor `DeflaterMetaCompressor` will be used. Users can pass other compressor such as `zstd` for better compression rate. [...] +| `deserializeNonexistentClass` | Enables or disables deserialization/skipping of data for non-existent classes. [...] +| `codeGenEnabled` | Disabling may result in faster initial serialization but slower subsequent serializations. [...] +| `asyncCompilationEnabled` | If enabled, serialization uses interpreter mode first and switches to JIT serialization after async serializer JIT for a class is finished. [...] +| `scalaOptimizationEnabled` | Enables or disables Scala-specific serialization optimization. [...] +| `copyRef` | When disabled, the copy performance will be better. But fory deep copy will ignore circular and shared reference. Same reference of an object graph will be copied into different objects in one `Fory#copy`. [...] +| `serializeEnumByName` | When Enabled, fory serialize enum by name instead of ordinal. [...] + +## Advanced Usage + +### Fory creation + +Single thread fory: + +```java +Fory fory = Fory.builder() + .withLanguage(Language.JAVA) + // enable reference tracking for shared/circular reference. + // Disable it will have better performance if no duplicate reference. + .withRefTracking(false) + .withCompatibleMode(CompatibleMode.SCHEMA_CONSISTENT) + // enable type forward/backward compatibility + // disable it for small size and better performance. + // .withCompatibleMode(CompatibleMode.COMPATIBLE) + // enable async multi-threaded compilation. + .withAsyncCompilation(true) + .build(); +byte[] bytes = fory.serialize(object); +System.out.println(fory.deserialize(bytes)); +``` + +Thread-safe fory: + +```java +ThreadSafeFory fory = Fory.builder() + .withLanguage(Language.JAVA) + // enable reference tracking for shared/circular reference. + // Disable it will have better performance if no duplicate reference. + .withRefTracking(false) + // compress int for smaller size + // .withIntCompressed(true) + // compress long for smaller size + // .withLongCompressed(true) + .withCompatibleMode(CompatibleMode.SCHEMA_CONSISTENT) + // enable type forward/backward compatibility + // disable it for small size and better performance. + // .withCompatibleMode(CompatibleMode.COMPATIBLE) + // enable async multi-threaded compilation. + .withAsyncCompilation(true) + .buildThreadSafeFory(); +byte[] bytes = fory.serialize(object); +System.out.println(fory.deserialize(bytes)); +``` + +### Handling Class Schema Evolution in Serialization + +In many systems, the schema of a class used for serialization may change over time. For instance, fields within a class +may be added or removed. When serialization and deserialization processes use different versions of jars, the schema of +the class being deserialized may differ from the one used during serialization. + +By default, Fory serializes objects using the `CompatibleMode.SCHEMA_CONSISTENT` mode. This mode assumes that the +deserialization process uses the same class schema as the serialization process, minimizing payload overhead. +However, if there is a schema inconsistency, deserialization will fail. + +If the schema is expected to change, to make deserialization succeed, i.e. schema forward/backward compatibility. +Users must configure Fory to use `CompatibleMode.COMPATIBLE`. This can be done using the +`ForyBuilder#withCompatibleMode(CompatibleMode.COMPATIBLE)` method. +In this compatible mode, deserialization can handle schema changes such as missing or extra fields, allowing it to +succeed even when the serialization and deserialization processes have different class schemas. + +Here is an example of creating Fory to support schema evolution: + +```java +Fory fory = Fory.builder() + .withCompatibleMode(CompatibleMode.COMPATIBLE) + .build(); + +byte[] bytes = fory.serialize(object); +System.out.println(fory.deserialize(bytes)); +``` + +This compatible mode involves serializing class metadata into the serialized output. Despite Fory's use of +sophisticated compression techniques to minimize overhead, there is still some additional space cost associated with +class metadata. + +To further reduce metadata costs, Fory introduces a class metadata sharing mechanism, which allows the metadata to be +sent to the deserialization process only once. For more details, please refer to the [Meta Sharing](#MetaSharing) +section. + +### Smaller size + +`ForyBuilder#withIntCompressed`/`ForyBuilder#withLongCompressed` can be used to compress int/long for smaller size. +Normally compress int is enough. + +Both compression are enabled by default, if the serialized is not important, for example, you use flatbuffers for +serialization before, which doesn't compress anything, then you should disable compression. If your data are all +numbers, +the compression may bring 80% performance regression. + +For int compression, fory use 1~5 bytes for encoding. First bit in every byte indicate whether has next byte. if first +bit is set, then next byte will be read util first bit of next byte is unset. + +For long compression, fory support two encoding: + +- Fory SLI(Small long as int) Encoding (**used by default**): + - If long is in `[-1073741824, 1073741823]`, encode as 4 bytes int: `| little-endian: ((int) value) << 1 |` + - Otherwise write as 9 bytes: `| 0b1 | little-endian 8bytes long |` +- Fory PVL(Progressive Variable-length Long) Encoding: + - First bit in every byte indicate whether has next byte. if first bit is set, then next byte will be read util + first bit of next byte is unset. + - Negative number will be converted to positive number by `(v << 1) ^ (v >> 63)` to reduce cost of small negative + numbers. + +If a number are `long` type, it can't be represented by smaller bytes mostly, the compression won't get good enough +result, +not worthy compared to performance cost. Maybe you should try to disable long compression if you find it didn't bring +much +space savings. + +### Object deep copy + +Deep copy example: + +```java +Fory fory = Fory.builder().withRefCopy(true).build(); +SomeClass a = xxx; +SomeClass copied = fory.copy(a); +``` + +Make fory deep copy ignore circular and shared reference, this deep copy mode will ignore circular and shared reference. +Same reference of an object graph will be copied into different objects in one `Fory#copy`. + +```java +Fory fory = Fory.builder().withRefCopy(false).build(); +SomeClass a = xxx; +SomeClass copied = fory.copy(a); +``` + +### Implement a customized serializer + +In some cases, you may want to implement a serializer for your type, especially some class customize serialization by +JDK +writeObject/writeReplace/readObject/readResolve, which is very inefficient. For example, you don't want +following `Foo#writeObject` +got invoked, you can take following `FooSerializer` as an example: + +```java +class Foo { + public long f1; + + private void writeObject(ObjectOutputStream s) throws IOException { + System.out.println(f1); + s.defaultWriteObject(); + } +} + +class FooSerializer extends Serializer<Foo> { + public FooSerializer(Fory fory) { + super(fory, Foo.class); + } + + @Override + public void write(MemoryBuffer buffer, Foo value) { + buffer.writeInt64(value.f1); + } + + @Override + public Foo read(MemoryBuffer buffer) { + Foo foo = new Foo(); + foo.f1 = buffer.readInt64(); + return foo; + } +} +``` + +Register serializer: + +```java +Fory fory = getFory(); +fory.registerSerializer(Foo.class, new FooSerializer(fory)); +``` + +### Security & Class Registration + +`ForyBuilder#requireClassRegistration` can be used to disable class registration, this will allow to deserialize objects +unknown types, +more flexible but **may be insecure if the classes contains malicious code**. + +**Do not disable class registration unless you can ensure your environment is secure**. +Malicious code in `init/equals/hashCode` can be executed when deserializing unknown/untrusted types when this option +disabled. + +Class registration can not only reduce security risks, but also avoid classname serialization cost. + +You can register class with API `Fory#register`. + +Note that class registration order is important, serialization and deserialization peer +should have same registration order. + +```java +Fory fory = xxx; +fory.register(SomeClass.class); +fory.register(SomeClass1.class, 200); +``` + +If you invoke `ForyBuilder#requireClassRegistration(false)` to disable class registration check, +you can set `org.apache.fory.resolver.ClassChecker` by `ClassResolver#setClassChecker` to control which classes are +allowed +for serialization. For example, you can allow classes started with `org.example.*` by: + +```java +Fory fory = xxx; +fory.getClassResolver().setClassChecker( + (classResolver, className) -> className.startsWith("org.example.")); +``` + +```java +AllowListChecker checker = new AllowListChecker(AllowListChecker.CheckLevel.STRICT); +ThreadSafeFory fory = new ThreadLocalFory(classLoader -> { + Fory f = Fory.builder().requireClassRegistration(true).withClassLoader(classLoader).build(); + f.getClassResolver().setClassChecker(checker); + checker.addListener(f.getClassResolver()); + return f; +}); +checker.allowClass("org.example.*"); +``` + +Fory also provided a `org.apache.fory.resolver.AllowListChecker` which is allowed/disallowed list based checker to +simplify +the customization of class check mechanism. You can use this checker or implement more sophisticated checker by +yourself. + +### Register class by name + +Register class by id will have better performance and smaller space overhead. But in some cases, management for a bunch +of type id is complex. In such cases, registering class by name using API +`register(Class<?> cls, String namespace, String typeName)` is recommended. + +```java +fory.register(Foo.class, "demo", "Foo"); +``` + +If there are no duplicate name for type, `namespace` can be left as empty to reduce serialized size. + +**Do not use this API to register class since it will increase serialized size a lot compared to register +class by id** + +### Serializer Registration + +You can also register a custom serializer for a class by `Fory#registerSerializer` API. + +Or implement `java.io.Externalizable` for a class. + +### Zero-Copy Serialization + +```java +import org.apache.fory.*; +import org.apache.fory.config.*; +import org.apache.fory.serializer.BufferObject; +import org.apache.fory.memory.MemoryBuffer; + +import java.util.*; +import java.util.stream.Collectors; + +public class ZeroCopyExample { + // Note that fory instance should be reused instead of creation every time. + static Fory fory = Fory.builder() + .withLanguage(Language.JAVA) + .build(); + + // mvn exec:java -Dexec.mainClass="io.ray.fory.examples.ZeroCopyExample" + public static void main(String[] args) { + List<Object> list = Arrays.asList("str", new byte[1000], new int[100], new double[100]); + Collection<BufferObject> bufferObjects = new ArrayList<>(); + byte[] bytes = fory.serialize(list, e -> !bufferObjects.add(e)); + List<MemoryBuffer> buffers = bufferObjects.stream() + .map(BufferObject::toBuffer).collect(Collectors.toList()); + System.out.println(fory.deserialize(bytes, buffers)); + } +} +``` + +### Meta Sharing + +Fory supports share type metadata (class name, field name, final field type information, etc.) between multiple +serializations in a context (ex. TCP connection), and this information will be sent to the peer during the first +serialization in the context. Based on this metadata, the peer can rebuild the same deserializer, which avoids +transmitting metadata for subsequent serializations and reduces network traffic pressure and supports type +forward/backward compatibility automatically. + +```java +// Fory.builder() +// .withLanguage(Language.JAVA) +// .withRefTracking(false) +// // share meta across serialization. +// .withMetaContextShare(true) +// Not thread-safe fory. +MetaContext context = xxx; +fory.getSerializationContext().setMetaContext(context); +byte[] bytes = fory.serialize(o); +// Not thread-safe fory. +MetaContext context = xxx; +fory.getSerializationContext().setMetaContext(context); +fory.deserialize(bytes); + +// Thread-safe fory +fory.setClassLoader(beanA.getClass().getClassLoader()); +byte[] serialized = fory.execute( + f -> { + f.getSerializationContext().setMetaContext(context); + return f.serialize(beanA); + } +); +// thread-safe fory +fory.setClassLoader(beanA.getClass().getClassLoader()); +Object newObj = fory.execute( + f -> { + f.getSerializationContext().setMetaContext(context); + return f.deserialize(serialized); + } +); +``` + +### Deserialize non-existent classes + +Fory support deserializing non-existent classes, this feature can be enabled +by `ForyBuilder#deserializeNonexistentClass(true)`. When enabled, and metadata sharing enabled, Fory will store +the deserialized data of this type in a lazy subclass of Map. By using the lazy map implemented by Fory, the rebalance +cost of filling map during deserialization can be avoided, which further improves performance. If this data is sent to +another process and the class exists in this process, the data will be deserialized into the object of this type without +losing any information. + +If metadata sharing is not enabled, the new class data will be skipped and an `NonexistentSkipClass` stub object will be +returned. + +### Coping/Mapping object from one type to another type + +Fory support mapping object from one type to another type. +> Notes: +> +> 1. This mapping will execute a deep copy, all mapped fields are serialized into binary and + deserialized from that binary to map into another type. +> 2. All struct types must be registered with same ID, otherwise Fory can not mapping to correct struct type. + > Be careful when you use `Fory#register(Class)`, because fory will allocate an auto-grown ID which might be + > inconsistent if you register classes with different order between Fory instance. + +```java +public class StructMappingExample { + static class Struct1 { + int f1; + String f2; + + public Struct1(int f1, String f2) { + this.f1 = f1; + this.f2 = f2; + } + } + + static class Struct2 { + int f1; + String f2; + double f3; + } + + static ThreadSafeFory fory1 = Fory.builder() + .withCompatibleMode(CompatibleMode.COMPATIBLE).buildThreadSafeFory(); + static ThreadSafeFory fory2 = Fory.builder() + .withCompatibleMode(CompatibleMode.COMPATIBLE).buildThreadSafeFory(); + + static { + fory1.register(Struct1.class); + fory2.register(Struct2.class); + } + + public static void main(String[] args) { + Struct1 struct1 = new Struct1(10, "abc"); + Struct2 struct2 = (Struct2) fory2.deserialize(fory1.serialize(struct1)); + Assert.assertEquals(struct2.f1, struct1.f1); + Assert.assertEquals(struct2.f2, struct1.f2); + struct1 = (Struct1) fory1.deserialize(fory2.serialize(struct2)); + Assert.assertEquals(struct1.f1, struct2.f1); + Assert.assertEquals(struct1.f2, struct2.f2); + } +} +``` + +## Migration + +### JDK migration + +If you use jdk serialization before, and you can't upgrade your client and server at the same time, which is common for +online application. Fory provided an util method `org.apache.fory.serializer.JavaSerializer.serializedByJDK` to check +whether +the binary are generated by jdk serialization, you use following pattern to make exiting serialization protocol-aware, +then upgrade serialization to fory in an async rolling-up way: + +```java +if (JavaSerializer.serializedByJDK(bytes)) { + ObjectInputStream objectInputStream=xxx; + return objectInputStream.readObject(); +} else { + return fory.deserialize(bytes); +} +``` + +### Upgrade fory + +Currently binary compatibility is ensured for minor versions only. For example, if you are using fory`v0.2.0`, binary +compatibility will +be provided if you upgrade to fory `v0.2.1`. But if upgrade to fory `v0.4.1`, no binary compatibility are ensured. +Most of the time there is no need to upgrade fory to newer major version, the current version is fast and compact +enough, +and we provide some minor fix for recent older versions. + +But if you do want to upgrade fory for better performance and smaller size, you need to write fory version as header to +serialized data +using code like following to keep binary compatibility: + +```java +MemoryBuffer buffer = xxx; +buffer.writeVarInt32(2); +fory.serialize(buffer, obj); +``` + +Then for deserialization, you need: + +```java +MemoryBuffer buffer = xxx; +int foryVersion = buffer.readVarInt32(); +Fory fory = getFory(foryVersion); +fory.deserialize(buffer); +``` + +`getFory` is a method to load corresponding fory, you can shade and relocate different version of fory to different +package, and load fory by version. + +If you upgrade fory by minor version, or you won't have data serialized by older fory, you can upgrade fory directly, +no need to `versioning` the data. + +## Trouble shooting + +### Class inconsistency and class version check + +If you create fory without setting `CompatibleMode` to `org.apache.fory.config.CompatibleMode.COMPATIBLE`, and you got a +strange +serialization error, it may be caused by class inconsistency between serialization peer and deserialization peer. + +In such cases, you can invoke `ForyBuilder#withClassVersionCheck` to create fory to validate it, if deserialization +throws `org.apache.fory.exception.ClassNotCompatibleException`, it shows class are inconsistent, and you should create +fory with +`ForyBuilder#withCompaibleMode(CompatibleMode.COMPATIBLE)`. + +`CompatibleMode.COMPATIBLE` has more performance and space cost, do not set it by default if your classes are always +consistent between serialization and deserialization. + +### Deserialize POJO into another type + +Fory allows you to serialize one POJO and deserialize it into a different POJO. The different POJO means the schema inconsistency. Users must to configure Fory with +`CompatibleMode` set to `org.apache.fory.config.CompatibleMode.COMPATIBLE`. + +```java +public class DeserializeIntoType { + static class Struct1 { + int f1; + String f2; + + public Struct1(int f1, String f2) { + this.f1 = f1; + this.f2 = f2; + } + } + + static class Struct2 { + int f1; + String f2; + double f3; + } + + static ThreadSafeFory fory = Fory.builder() + .withCompatibleMode(CompatibleMode.COMPATIBLE).buildThreadSafeFory(); + + public static void main(String[] args) { + Struct1 struct1 = new Struct1(10, "abc"); + byte[] data = fory.serializeJavaObject(struct1); + Struct2 struct2 = (Struct2) fory.deserializeJavaObject(bytes, Struct2.class); + } +} +``` + +### Use wrong API for deserialization + +If you serialize an object by invoking `Fory#serialize`, you should invoke `Fory#deserialize` for deserialization +instead of +`Fory#deserializeJavaObject`. + +If you serialize an object by invoking `Fory#serializeJavaObject`, you should invoke `Fory#deserializeJavaObject` for +deserialization instead of `Fory#deserializeJavaObjectAndClass`/`Fory#deserialize`. + +If you serialize an object by invoking `Fory#serializeJavaObjectAndClass`, you should +invoke `Fory#deserializeJavaObjectAndClass` for deserialization instead +of `Fory#deserializeJavaObject`/`Fory#deserialize`. diff --git a/docs/guide/row_format_guide.md b/docs/guide/row_format_guide.md new file mode 100644 index 00000000..5d739e80 --- /dev/null +++ b/docs/guide/row_format_guide.md @@ -0,0 +1,154 @@ +--- +title: Row Format Guide +sidebar_position: 1 +id: row_format_guide +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +## Row format protocol + +### Java + +```java +public class Bar { + String f1; + List<Long> f2; +} + +public class Foo { + int f1; + List<Integer> f2; + Map<String, Integer> f3; + List<Bar> f4; +} + +RowEncoder<Foo> encoder = Encoders.bean(Foo.class); +Foo foo = new Foo(); +foo.f1 = 10; +foo.f2 = IntStream.range(0, 1000000).boxed().collect(Collectors.toList()); +foo.f3 = IntStream.range(0, 1000000).boxed().collect(Collectors.toMap(i -> "k"+i, i->i)); +List<Bar> bars = new ArrayList<>(1000000); +for (int i = 0; i < 1000000; i++) { + Bar bar = new Bar(); + bar.f1 = "s"+i; + bar.f2 = LongStream.range(0, 10).boxed().collect(Collectors.toList()); + bars.add(bar); +} +foo.f4 = bars; +// Can be zero-copy read by python +BinaryRow binaryRow = encoder.toRow(foo); +// can be data from python +Foo newFoo = encoder.fromRow(binaryRow); +// zero-copy read List<Integer> f2 +BinaryArray binaryArray2 = binaryRow.getArray(1); +// zero-copy read List<Bar> f4 +BinaryArray binaryArray4 = binaryRow.getArray(3); +// zero-copy read 11th element of `readList<Bar> f4` +BinaryRow barStruct = binaryArray4.getStruct(10); + +// zero-copy read 6th of f2 of 11th element of `readList<Bar> f4` +barStruct.getArray(1).getInt64(5); +RowEncoder<Bar> barEncoder = Encoders.bean(Bar.class); +// deserialize part of data. +Bar newBar = barEncoder.fromRow(barStruct); +Bar newBar2 = barEncoder.fromRow(binaryArray4.getStruct(20)); +``` + +### Python + +```python +@dataclass +class Bar: + f1: str + f2: List[pa.int64] +@dataclass +class Foo: + f1: pa.int32 + f2: List[pa.int32] + f3: Dict[str, pa.int32] + f4: List[Bar] + +encoder = pyfory.encoder(Foo) +foo = Foo(f1=10, f2=list(range(1000_000)), + f3={f"k{i}": i for i in range(1000_000)}, + f4=[Bar(f1=f"s{i}", f2=list(range(10))) for i in range(1000_000)]) +binary: bytes = encoder.to_row(foo).to_bytes() +print(f"start: {datetime.datetime.now()}") +foo_row = pyfory.RowData(encoder.schema, binary) +print(foo_row.f2[100000], foo_row.f4[100000].f1, foo_row.f4[200000].f2[5]) +print(f"end: {datetime.datetime.now()}") + +binary = pickle.dumps(foo) +print(f"pickle start: {datetime.datetime.now()}") +new_foo = pickle.loads(binary) +print(new_foo.f2[100000], new_foo.f4[100000].f1, new_foo.f4[200000].f2[5]) +print(f"pickle end: {datetime.datetime.now()}") +``` + +### Apache Arrow Support + +Fory Format also supports automatic conversion from/to Arrow Table/RecordBatch. + +Java: + +```java +Schema schema = TypeInference.inferSchema(BeanA.class); +ArrowWriter arrowWriter = ArrowUtils.createArrowWriter(schema); +Encoder<BeanA> encoder = Encoders.rowEncoder(BeanA.class); +for (int i = 0; i < 10; i++) { + BeanA beanA = BeanA.createBeanA(2); + arrowWriter.write(encoder.toRow(beanA)); +} +return arrowWriter.finishAsRecordBatch(); +``` + +Python: + +```python +import pyfory +encoder = pyfory.encoder(Foo) +encoder.to_arrow_record_batch([foo] * 10000) +encoder.to_arrow_table([foo] * 10000) +``` + +C++ + +```c++ +std::shared_ptr<ArrowWriter> arrow_writer; +EXPECT_TRUE( + ArrowWriter::Make(schema, ::arrow::default_memory_pool(), &arrow_writer) + .ok()); +for (auto &row : rows) { + EXPECT_TRUE(arrow_writer->Write(row).ok()); +} +std::shared_ptr<::arrow::RecordBatch> record_batch; +EXPECT_TRUE(arrow_writer->Finish(&record_batch).ok()); +EXPECT_TRUE(record_batch->Validate().ok()); +EXPECT_EQ(record_batch->num_columns(), schema->num_fields()); +EXPECT_EQ(record_batch->num_rows(), row_nums); +``` + +```java +Schema schema = TypeInference.inferSchema(BeanA.class); +ArrowWriter arrowWriter = ArrowUtils.createArrowWriter(schema); +Encoder<BeanA> encoder = Encoders.rowEncoder(BeanA.class); +for (int i = 0; i < 10; i++) { + BeanA beanA = BeanA.createBeanA(2); + arrowWriter.write(encoder.toRow(beanA)); +} +return arrowWriter.finishAsRecordBatch(); +``` diff --git a/docs/guide/scala_guide.md b/docs/guide/scala_guide.md new file mode 100644 index 00000000..315d3c8d --- /dev/null +++ b/docs/guide/scala_guide.md @@ -0,0 +1,170 @@ +--- +title: Scala Serialization Guide +sidebar_position: 4 +id: scala_guide +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +Fory supports all scala object serialization: + +- `case` class serialization supported +- `pojo/bean` class serialization supported +- `object` singleton serialization supported +- `collection` serialization supported +- other types such as `tuple/either` and basic types are all supported too. + +Scala 2 and 3 are both supported. + +## Install + +To add a dependency on Fory scala for scala 2 with sbt, use the following: + +```sbt +libraryDependencies += "org.apache.fory" % "fory-scala_2.13" % "0.10.3" +``` + +To add a dependency on Fory scala for scala 3 with sbt, use the following: + +```sbt +libraryDependencies += "org.apache.fory" % "fory-scala_3" % "0.10.3" +``` + +## Quict Start + +```scala +case class Person(name: String, id: Long, github: String) +case class Point(x : Int, y : Int, z : Int) + +object ScalaExample { + val fory: Fory = Fory.builder().withScalaOptimizationEnabled(true).build() + // Register optimized fory serializers for scala + ScalaSerializers.registerSerializers(fory) + fory.register(classOf[Person]) + fory.register(classOf[Point]) + + def main(args: Array[String]): Unit = { + val p = Person("Shawn Yang", 1, "https://github.com/chaokunyang") + println(fory.deserialize(fory.serialize(p))) + println(fory.deserialize(fory.serialize(Point(1, 2, 3)))) + } +} +``` + +## Fory creation + +When using fory for scala serialization, you should create fory at least with following options: + +```scala +import org.apache.fory.Fory +import org.apache.fory.serializer.scala.ScalaSerializers + +val fory = Fory.builder().withScalaOptimizationEnabled(true).build() + +// Register optimized fory serializers for scala +ScalaSerializers.registerSerializers(fory) +``` + +Depending on the object types you serialize, you may need to register some scala internal types: + +```scala +fory.register(Class.forName("scala.Enumeration.Val")) +``` + +If you want to avoid such registration, you can disable class registration by `ForyBuilder#requireClassRegistration(false)`. +Note that this option allow to deserialize objects unknown types, more flexible but may be insecure if the classes contains malicious code. + +And circular references are common in scala, `Reference tracking` should be enabled by `ForyBuilder#withRefTracking(true)`. If you don't enable reference tracking, [StackOverflowError](https://github.com/apache/fory/issues/1032) may happen for some scala versions when serializing scala Enumeration. + +Note that fory instance should be shared between multiple serialization, the creation of fory instance is not cheap. + +If you use shared fory instance across multiple threads, you should create `ThreadSafeFory` instead by `ForyBuilder#buildThreadSafeFory()` instead. + +## Serialize case object + +```scala +case class Person(github: String, age: Int, id: Long) +val p = Person("https://github.com/chaokunyang", 18, 1) +println(fory.deserialize(fory.serialize(p))) +println(fory.deserializeJavaObject(fory.serializeJavaObject(p))) +``` + +## Serialize pojo + +```scala +class Foo(f1: Int, f2: String) { + override def toString: String = s"Foo($f1, $f2)" +} +println(fory.deserialize(fory.serialize(Foo(1, "chaokunyang")))) +``` + +## Serialize object singleton + +```scala +object singleton { +} +val o1 = fory.deserialize(fory.serialize(singleton)) +val o2 = fory.deserialize(fory.serialize(singleton)) +println(o1 == o2) +``` + +## Serialize collection + +```scala +val seq = Seq(1,2) +val list = List("a", "b") +val map = Map("a" -> 1, "b" -> 2) +println(fory.deserialize(fory.serialize(seq))) +println(fory.deserialize(fory.serialize(list))) +println(fory.deserialize(fory.serialize(map))) +``` + +## Serialize Tuple + +```scala +val tuple = Tuple2(100, 10000L) +println(fory.deserialize(fory.serialize(tuple))) +val tuple = Tuple4(100, 10000L, 10000L, "str") +println(fory.deserialize(fory.serialize(tuple))) +``` + +## Serialize Enum + +### Scala3 Enum + +```scala +enum Color { case Red, Green, Blue } +println(fory.deserialize(fory.serialize(Color.Green))) +``` + +### Scala2 Enum + +```scala +object ColorEnum extends Enumeration { + type ColorEnum = Value + val Red, Green, Blue = Value +} +println(fory.deserialize(fory.serialize(ColorEnum.Green))) +``` + +## Serialize Option + +```scala +val opt: Option[Long] = Some(100) +println(fory.deserialize(fory.serialize(opt))) +val opt1: Option[Long] = None +println(fory.deserialize(fory.serialize(opt1))) +``` diff --git a/docs/guide/xlang_serialization_guide.md b/docs/guide/xlang_serialization_guide.md new file mode 100644 index 00000000..829b04f5 --- /dev/null +++ b/docs/guide/xlang_serialization_guide.md @@ -0,0 +1,612 @@ +--- +title: Xlang Serialization Guide +sidebar_position: 2 +id: xlang_object_graph_guide +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +## Cross-language object graph serialization + +### Serialize built-in types + +Common types can be serialized automatically: primitive numeric types, string, binary, array, list, map and so on. + +**Java** + +```java +import org.apache.fory.*; +import org.apache.fory.config.*; + +import java.util.*; + +public class Example1 { + public static void main(String[] args) { + Fory fory = Fory.builder().withLanguage(Language.XLANG).build(); + List<Object> list = ofArrayList(true, false, "str", -1.1, 1, new int[100], new double[20]); + byte[] bytes = fory.serialize(list); + // bytes can be data serialized by other languages. + fory.deserialize(bytes); + Map<Object, Object> map = new HashMap<>(); + map.put("k1", "v1"); + map.put("k2", list); + map.put("k3", -1); + bytes = fory.serialize(map); + // bytes can be data serialized by other languages. + fory.deserialize(bytes); + } +} +``` + +**Python** + +```python +import pyfory +import numpy as np + +fory = pyfory.Fory() +object_list = [True, False, "str", -1.1, 1, + np.full(100, 0, dtype=np.int32), np.full(20, 0.0, dtype=np.double)] +data = fory.serialize(object_list) +# bytes can be data serialized by other languages. +new_list = fory.deserialize(data) +object_map = {"k1": "v1", "k2": object_list, "k3": -1} +data = fory.serialize(object_map) +# bytes can be data serialized by other languages. +new_map = fory.deserialize(data) +print(new_map) +``` + +**Golang** + +```go +package main + +import forygo "github.com/apache/fory/fory/go/fory" +import "fmt" + +func main() { + list := []interface{}{true, false, "str", -1.1, 1, make([]int32, 10), make([]float64, 20)} + fory := forygo.NewFory() + bytes, err := fory.Marshal(list) + if err != nil { + panic(err) + } + var newValue interface{} + // bytes can be data serialized by other languages. + if err := fory.Unmarshal(bytes, &newValue); err != nil { + panic(err) + } + fmt.Println(newValue) + dict := map[string]interface{}{ + "k1": "v1", + "k2": list, + "k3": -1, + } + bytes, err = fory.Marshal(dict) + if err != nil { + panic(err) + } + // bytes can be data serialized by other languages. + if err := fory.Unmarshal(bytes, &newValue); err != nil { + panic(err) + } + fmt.Println(newValue) +} +``` + +**JavaScript** + +```javascript +import Fory from '@foryjs/fory'; + +/** + * @foryjs/hps use v8's fast-calls-api that can be called directly by jit, ensure that the version of Node is 20 or above. + * Experimental feature, installation success cannot be guaranteed at this moment + * If you are unable to install the module, replace it with `const hps = null;` + **/ +import hps from '@foryjs/hps'; + +const fory = new Fory({ hps }); +const input = fory.serialize('hello fory'); +const result = fory.deserialize(input); +console.log(result); +``` + +**Rust** + +```rust +use chrono::{NaiveDate, NaiveDateTime}; +use fory::{from_buffer, to_buffer, Fory}; +use std::collections::HashMap; + +fn run() { + let bin: Vec<u8> = to_buffer(&"hello".to_string()); + let obj: String = from_buffer(&bin).expect("should success"); + assert_eq!("hello".to_string(), obj); +} +``` + +### Serialize custom types + +Serializing user-defined types needs registering the custom type using the register API to establish the mapping relationship between the type in different languages. + +**Java** + +```java +import org.apache.fory.*; +import org.apache.fory.config.*; +import java.util.*; + +public class Example2 { + public static class SomeClass1 { + Object f1; + Map<Byte, Integer> f2; + } + + public static class SomeClass2 { + Object f1; + String f2; + List<Object> f3; + Map<Byte, Integer> f4; + Byte f5; + Short f6; + Integer f7; + Long f8; + Float f9; + Double f10; + short[] f11; + List<Short> f12; + } + + public static Object createObject() { + SomeClass1 obj1 = new SomeClass1(); + obj1.f1 = true; + obj1.f2 = ofHashMap((byte) -1, 2); + SomeClass2 obj = new SomeClass2(); + obj.f1 = obj1; + obj.f2 = "abc"; + obj.f3 = ofArrayList("abc", "abc"); + obj.f4 = ofHashMap((byte) 1, 2); + obj.f5 = Byte.MAX_VALUE; + obj.f6 = Short.MAX_VALUE; + obj.f7 = Integer.MAX_VALUE; + obj.f8 = Long.MAX_VALUE; + obj.f9 = 1.0f / 2; + obj.f10 = 1 / 3.0; + obj.f11 = new short[]{(short) 1, (short) 2}; + obj.f12 = ofArrayList((short) -1, (short) 4); + return obj; + } + + // mvn exec:java -Dexec.mainClass="org.apache.fory.examples.Example2" + public static void main(String[] args) { + Fory fory = Fory.builder().withLanguage(Language.XLANG).build(); + fory.register(SomeClass1.class, "example.SomeClass1"); + fory.register(SomeClass2.class, "example.SomeClass2"); + byte[] bytes = fory.serialize(createObject()); + // bytes can be data serialized by other languages. + System.out.println(fory.deserialize(bytes)); + } +} +``` + +**Python** + +```python +from dataclasses import dataclass +from typing import List, Dict, Any +import pyfory, array + + +@dataclass +class SomeClass1: + f1: Any + f2: Dict[pyfory.Int8Type, pyfory.Int32Type] + + +@dataclass +class SomeClass2: + f1: Any = None + f2: str = None + f3: List[str] = None + f4: Dict[pyfory.Int8Type, pyfory.Int32Type] = None + f5: pyfory.Int8Type = None + f6: pyfory.Int16Type = None + f7: pyfory.Int32Type = None + # int type will be taken as `pyfory.Int64Type`. + # use `pyfory.Int32Type` for type hint if peer + # are more narrow type. + f8: int = None + f9: pyfory.Float32Type = None + # float type will be taken as `pyfory.Float64Type` + f10: float = None + f11: pyfory.Int16ArrayType = None + f12: List[pyfory.Int16Type] = None + + +if __name__ == "__main__": + f = pyfory.Fory() + f.register_type(SomeClass1, typename="example.SomeClass1") + f.register_type(SomeClass2, typename="example.SomeClass2") + obj1 = SomeClass1(f1=True, f2={-1: 2}) + obj = SomeClass2( + f1=obj1, + f2="abc", + f3=["abc", "abc"], + f4={1: 2}, + f5=2 ** 7 - 1, + f6=2 ** 15 - 1, + f7=2 ** 31 - 1, + f8=2 ** 63 - 1, + f9=1.0 / 2, + f10=1 / 3.0, + f11=array.array("h", [1, 2]), + f12=[-1, 4], + ) + data = f.serialize(obj) + # bytes can be data serialized by other languages. + print(f.deserialize(data)) +``` + +**Golang** + +```go +package main + +import forygo "github.com/apache/fory/fory/go/fory" +import "fmt" + +func main() { + type SomeClass1 struct { + F1 interface{} + F2 string + F3 []interface{} + F4 map[int8]int32 + F5 int8 + F6 int16 + F7 int32 + F8 int64 + F9 float32 + F10 float64 + F11 []int16 + F12 fory.Int16Slice + } + + type SomeClas2 struct { + F1 interface{} + F2 map[int8]int32 + } + fory := forygo.NewFory() + if err := fory.RegisterTagType("example.SomeClass1", SomeClass1{}); err != nil { + panic(err) + } + if err := fory.RegisterTagType("example.SomeClass2", SomeClass2{}); err != nil { + panic(err) + } + obj1 := &SomeClass1{} + obj1.F1 = true + obj1.F2 = map[int8]int32{-1: 2} + obj := &SomeClass1{} + obj.F1 = obj1 + obj.F2 = "abc" + obj.F3 = []interface{}{"abc", "abc"} + f4 := map[int8]int32{1: 2} + obj.F4 = f4 + obj.F5 = fory.MaxInt8 + obj.F6 = fory.MaxInt16 + obj.F7 = fory.MaxInt32 + obj.F8 = fory.MaxInt64 + obj.F9 = 1.0 / 2 + obj.F10 = 1 / 3.0 + obj.F11 = []int16{1, 2} + obj.F12 = []int16{-1, 4} + bytes, err := fory.Marshal(obj); + if err != nil { + panic(err) + } + var newValue interface{} + // bytes can be data serialized by other languages. + if err := fory.Unmarshal(bytes, &newValue); err != nil { + panic(err) + } + fmt.Println(newValue) +} +``` + +**JavaScript** + +```javascript +import Fory, { Type, InternalSerializerType } from '@foryjs/fory'; + +/** + * @foryjs/hps use v8's fast-calls-api that can be called directly by jit, ensure that the version of Node is 20 or above. + * Experimental feature, installation success cannot be guaranteed at this moment + * If you are unable to install the module, replace it with `const hps = null;` + **/ +import hps from '@foryjs/hps'; + +// Now we describe data structures using JSON, but in the future, we will use more ways. +const description = Type.object('example.foo', { + foo: Type.string(), +}); +const fory = new Fory({ hps }); +const { serialize, deserialize } = fory.registerSerializer(description); +const input = serialize({ foo: 'hello fory' }); +const result = deserialize(input); +console.log(result); +``` + +**Rust** + +```rust +use chrono::{NaiveDate, NaiveDateTime}; +use fory::{from_buffer, to_buffer, Fory}; +use std::collections::HashMap; + +#[test] +fn complex_struct() { + #[derive(Fory, Debug, PartialEq)] + #[tag("example.foo2")] + struct Animal { + category: String, + } + + #[derive(Fory, Debug, PartialEq)] + #[tag("example.foo")] + struct Person { + c1: Vec<u8>, // binary + c2: Vec<i16>, // primitive array + animal: Vec<Animal>, + c3: Vec<Vec<u8>>, + name: String, + c4: HashMap<String, String>, + age: u16, + op: Option<String>, + op2: Option<String>, + date: NaiveDate, + time: NaiveDateTime, + c5: f32, + c6: f64, + } + let person: Person = Person { + c1: vec![1, 2, 3], + c2: vec![5, 6, 7], + c3: vec![vec![1, 2], vec![1, 3]], + animal: vec![Animal { + category: "Dog".to_string(), + }], + c4: HashMap::from([ + ("hello1".to_string(), "hello2".to_string()), + ("hello2".to_string(), "hello3".to_string()), + ]), + age: 12, + name: "helo".to_string(), + op: Some("option".to_string()), + op2: None, + date: NaiveDate::from_ymd_opt(2025, 12, 12).unwrap(), + time: NaiveDateTime::from_timestamp_opt(1689912359, 0).unwrap(), + c5: 2.0, + c6: 4.0, + }; + + let bin: Vec<u8> = to_buffer(&person); + let obj: Person = from_buffer(&bin).expect("should success"); + assert_eq!(person, obj); +} +``` + +### Serialize Shared Reference and Circular Reference + +Shared reference and circular reference can be serialized automatically, no duplicate data or recursion error. + +**Java** + +```java +import org.apache.fory.*; +import org.apache.fory.config.*; +import java.util.*; + +public class ReferenceExample { + public static class SomeClass { + SomeClass f1; + Map<String, String> f2; + Map<String, String> f3; + } + + public static Object createObject() { + SomeClass obj = new SomeClass(); + obj.f1 = obj; + obj.f2 = ofHashMap("k1", "v1", "k2", "v2"); + obj.f3 = obj.f2; + return obj; + } + + // mvn exec:java -Dexec.mainClass="org.apache.fory.examples.ReferenceExample" + public static void main(String[] args) { + Fory fory = Fory.builder().withLanguage(Language.XLANG) + .withRefTracking(true).build(); + fory.register(SomeClass.class, "example.SomeClass"); + byte[] bytes = fory.serialize(createObject()); + // bytes can be data serialized by other languages. + System.out.println(fory.deserialize(bytes)); + } +} +``` + +**Python** + +```python +from typing import Dict +import pyfory + +class SomeClass: + f1: "SomeClass" + f2: Dict[str, str] + f3: Dict[str, str] + +fory = pyfory.Fory(ref_tracking=True) +fory.register_type(SomeClass, typename="example.SomeClass") +obj = SomeClass() +obj.f2 = {"k1": "v1", "k2": "v2"} +obj.f1, obj.f3 = obj, obj.f2 +data = fory.serialize(obj) +# bytes can be data serialized by other languages. +print(fory.deserialize(data)) +``` + +**Golang** + +```go +package main + +import forygo "github.com/apache/fory/fory/go/fory" +import "fmt" + +func main() { + type SomeClass struct { + F1 *SomeClass + F2 map[string]string + F3 map[string]string + } + fory := forygo.NewFory(true) + if err := fory.RegisterTagType("example.SomeClass", SomeClass{}); err != nil { + panic(err) + } + value := &SomeClass{F2: map[string]string{"k1": "v1", "k2": "v2"}} + value.F3 = value.F2 + value.F1 = value + bytes, err := fory.Marshal(value) + if err != nil { + } + var newValue interface{} + // bytes can be data serialized by other languages. + if err := fory.Unmarshal(bytes, &newValue); err != nil { + panic(err) + } + fmt.Println(newValue) +} +``` + +**JavaScript** + +```javascript +import Fory, { Type } from '@foryjs/fory'; +/** + * @foryjs/hps use v8's fast-calls-api that can be called directly by jit, ensure that the version of Node is 20 or above. + * Experimental feature, installation success cannot be guaranteed at this moment + * If you are unable to install the module, replace it with `const hps = null;` + **/ +import hps from '@foryjs/hps'; + +const description = Type.object('example.foo', { + foo: Type.string(), + bar: Type.object('example.foo'), +}); + +const fory = new Fory({ hps }); +const { serialize, deserialize } = fory.registerSerializer(description); +const data: any = { + foo: 'hello fory', +}; +data.bar = data; +const input = serialize(data); +const result = deserialize(input); +console.log(result.bar.foo === result.foo); +``` + +**JavaScript** +Reference cannot be implemented because of rust ownership restrictions + +### Zero-Copy Serialization + +**Java** + +```java +import org.apache.fory.*; +import org.apache.fory.config.*; +import org.apache.fory.serializer.BufferObject; +import org.apache.fory.memory.MemoryBuffer; + +import java.util.*; +import java.util.stream.Collectors; + +public class ZeroCopyExample { + // mvn exec:java -Dexec.mainClass="io.ray.fory.examples.ZeroCopyExample" + public static void main(String[] args) { + Fory fory = Fory.builder().withLanguage(Language.XLANG).build(); + List<Object> list = ofArrayList("str", new byte[1000], new int[100], new double[100]); + Collection<BufferObject> bufferObjects = new ArrayList<>(); + byte[] bytes = fory.serialize(list, e -> !bufferObjects.add(e)); + // bytes can be data serialized by other languages. + List<MemoryBuffer> buffers = bufferObjects.stream() + .map(BufferObject::toBuffer).collect(Collectors.toList()); + System.out.println(fory.deserialize(bytes, buffers)); + } +} +``` + +**Python** + +```python +import array +import pyfory +import numpy as np + +fory = pyfory.Fory() +list_ = ["str", bytes(bytearray(1000)), + array.array("i", range(100)), np.full(100, 0.0, dtype=np.double)] +serialized_objects = [] +data = fory.serialize(list_, buffer_callback=serialized_objects.append) +buffers = [o.to_buffer() for o in serialized_objects] +# bytes can be data serialized by other languages. +print(fory.deserialize(data, buffers=buffers)) +``` + +**Golang** + +```go +package main + +import forygo "github.com/apache/fory/fory/go/fory" +import "fmt" + +func main() { + fory := forygo.NewFory() + list := []interface{}{"str", make([]byte, 1000)} + buf := fory.NewByteBuffer(nil) + var bufferObjects []fory.BufferObject + fory.Serialize(buf, list, func(o fory.BufferObject) bool { + bufferObjects = append(bufferObjects, o) + return false + }) + var newList []interface{} + var buffers []*fory.ByteBuffer + for _, o := range bufferObjects { + buffers = append(buffers, o.ToBuffer()) + } + if err := fory.Deserialize(buf, &newList, buffers); err != nil { + panic(err) + } + fmt.Println(newList) +} +``` + +**JavaScript** + +```javascript +// Coming soon +``` diff --git a/docs/guide/xlang_type_mapping.md b/docs/guide/xlang_type_mapping.md new file mode 100644 index 00000000..72781347 --- /dev/null +++ b/docs/guide/xlang_type_mapping.md @@ -0,0 +1,116 @@ +--- +title: Type Mapping of Xlang Serialization +sidebar_position: 3 +id: xlang_type_mapping +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +Note: + +- For type definition, see [Type Systems in Spec](../specification/xlang_serialization_spec.md#type-systems) +- `int16_t[n]/vector<T>` indicates `int16_t[n]/vector<int16_t>` +- The cross-language serialization is not stable, do not use it in your production environment. + +## Type Mapping + +| Fory Type | Fory Type ID | Java | Python | Javascript | C++ | Golang | Rust | +|-------------------------|--------------|-----------------|-----------------------------------|-----------------|--------------------------------|------------------|------------------| +| bool | 1 | bool/Boolean | bool | Boolean | bool | bool | bool | +| int8 | 2 | byte/Byte | int/pyfory.Int8 | Type.int8() | int8_t | int8 | i8 | +| int16 | 3 | short/Short | int/pyfory.Int16 | Type.int16() | int16_t | int16 | i6 | +| int32 | 4 | int/Integer | int/pyfory.Int32 | Type.int32() | int32_t | int32 | i32 | +| var_int32 | 5 | int/Integer | int/pyfory.VarInt32 | Type.varint32() | fory::varint32_t | fory.varint32 | fory::varint32 | +| int64 | 6 | long/Long | int/pyfory.Int64 | Type.int64() | int64_t | int64 | i64 | +| var_int64 | 7 | long/Long | int/pyfory.VarInt64 | Type.varint64() | fory::varint64_t | fory.varint64 | fory::varint64 | +| sli_int64 | 8 | long/Long | int/pyfory.SliInt64 | Type.sliint64() | fory::sliint64_t | fory.sliint64 | fory::sliint64 | +| float16 | 9 | float/Float | float/pyfory.Float16 | Type.float16() | fory::float16_t | fory.float16 | fory::f16 | +| float32 | 10 | float/Float | float/pyfory.Float32 | Type.float32() | float | float32 | f32 | +| float64 | 11 | double/Double | float/pyfory.Float64 | Type.float64() | double | float64 | f64 | +| string | 12 | String | str | String | string | string | String/str | +| enum | 13 | Enum subclasses | enum subclasses | / | enum | / | enum | +| named_enum | 14 | Enum subclasses | enum subclasses | / | enum | / | enum | +| struct | 15 | pojo/record | data class / type with type hints | object | struct/class | struct | struct | +| compatible_struct | 16 | pojo/record | data class / type with type hints | object | struct/class | struct | struct | +| named_struct | 17 | pojo/record | data class / type with type hints | object | struct/class | struct | struct | +| named_compatible_struct | 18 | pojo/record | data class / type with type hints | object | struct/class | struct | struct | +| ext | 19 | pojo/record | data class / type with type hints | object | struct/class | struct | struct | +| named_ext | 20 | pojo/record | data class / type with type hints | object | struct/class | struct | struct | +| list | 21 | List/Collection | list/tuple | array | vector | slice | Vec | +| set | 22 | Set | set | / | set | fory.Set | Set | +| map | 23 | Map | dict | Map | unordered_map | map | HashMap | +| duration | 24 | Duration | timedelta | Number | duration | Duration | Duration | +| timestamp | 25 | Instant | datetime | Number | std::chrono::nanoseconds | Time | DateTime | +| local_date | 26 | Date | datetime | Number | std::chrono::nanoseconds | Time | DateTime | +| decimal | 27 | BigDecimal | Decimal | bigint | / | / | / | +| binary | 28 | byte[] | bytes | / | `uint8_t[n]/vector<T>` | `[n]uint8/[]T` | `Vec<uint8_t>` | +| array | 29 | array | np.ndarray | / | / | array/slice | Vec | +| bool_array | 30 | bool[] | ndarray(np.bool_) | / | `bool[n]` | `[n]bool/[]T` | `Vec<bool>` | +| int8_array | 31 | byte[] | ndarray(int8) | / | `int8_t[n]/vector<T>` | `[n]int8/[]T` | `Vec<i18>` | +| int16_array | 32 | short[] | ndarray(int16) | / | `int16_t[n]/vector<T>` | `[n]int16/[]T` | `Vec<i16>` | +| int32_array | 33 | int[] | ndarray(int32) | / | `int32_t[n]/vector<T>` | `[n]int32/[]T` | `Vec<i32>` | +| int64_array | 34 | long[] | ndarray(int64) | / | `int64_t[n]/vector<T>` | `[n]int64/[]T` | `Vec<i64>` | +| float16_array | 35 | float[] | ndarray(float16) | / | `fory::float16_t[n]/vector<T>` | `[n]float16/[]T` | `Vec<fory::f16>` | +| float32_array | 36 | float[] | ndarray(float32) | / | `float[n]/vector<T>` | `[n]float32/[]T` | `Vec<f32>` | +| float64_array | 37 | double[] | ndarray(float64) | / | `double[n]/vector<T>` | `[n]float64/[]T` | `Vec<f64>` | +| arrow record batch | 38 | / | / | / | / | / | / | +| arrow table | 39 | / | / | / | / | / | / | + +## Type info(not implemented currently) + +Due to differences between type systems of languages, those types can't be mapped one-to-one between languages. + +If the user notices that one type on a language corresponds to multiple types in Fory type systems, for example, `long` +in java has type `int64/varint64/sliint64`, it means the language lacks some types, and the user must provide extra type +info when using Fory. + +## Type annotation + +If the type is a field of another class, users can provide meta hints for fields of a type, or for the whole type. +Such information can be provided in other languages too: + +- java: use annotation. +- cpp: use macro and template. +- golang: use struct tag. +- python: use typehint. +- rust: use macro. + +Here is en example: + +- Java: + + ```java + class Foo { + @Int32Type(varint = true) + int f1; + List<@Int32Type(varint = true) Integer> f2; + } + ``` + +- Python: + + ```python + class Foo: + f1: Int32Type(varint=True) + f2: List[Int32Type(varint=True)] + ``` + +## Type wrapper + +If the type is not a field of a class, the user must wrap this type with a Fory type to pass the extra type info. + +For example, suppose Fory Java provide a `VarInt64` type, when a user invoke `fory.serialize(long_value)`, he need to +invoke like `fory.serialize(new VarInt64(long_value))`. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
