This is an automated email from the ASF dual-hosted git repository.
chaokunyang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/fory.git
The following commit(s) were added to refs/heads/main by this push:
new ab1387317 docs(python): refine python code and add api doc (#2816)
ab1387317 is described below
commit ab138731795ca3b996845542f6d86b7c84d6a726
Author: Shawn Yang <[email protected]>
AuthorDate: Thu Oct 23 15:13:06 2025 +0800
docs(python): refine python code and add api doc (#2816)
## Why?
<!-- Describe the purpose of this PR. -->
## What does this PR do?
refine python code and add api doc
## Related issues
<!--
Is there any related issue? If this PR closes them you say say
fix/closes:
- #xxxx0
- #xxxx1
- Fixes #xxxx2
-->
## Does this PR introduce any user-facing change?
<!--
If any user-facing interface changes, please [open an
issue](https://github.com/apache/fory/issues/new/choose) describing the
need to do so and update the document if necessary.
Delete section if not applicable.
-->
- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?
## Benchmark
<!--
When the PR has an impact on performance (if you don't know whether the
PR will have an impact on performance, you can submit the PR first, and
if it will have impact on performance, the code reviewer will explain
it), be sure to attach a benchmark data here.
Delete section if not applicable.
-->
---
AGENTS.md | 4 +-
BUILD | 10 +-
CONTRIBUTING.md | 2 +-
python/CONTRIBUTING.md | 2 +-
python/pyfory/__init__.py | 128 ++++++-
python/pyfory/_fory.py | 280 +++++++++++---
python/pyfory/_registry.py | 2 +-
python/pyfory/_serializer.py | 32 +-
python/pyfory/codegen.py | 20 +-
python/pyfory/format/serializer.py | 6 +-
python/pyfory/format/tests/test_vectorized.py | 4 +-
.../{_serialization.pyx => serialization.pyx} | 392 +++++++++++++++++---
python/pyfory/serializer.py | 410 ++++++++++-----------
python/pyfory/tests/test_cross_language.py | 6 +-
python/pyfory/tests/test_metastring_resolver.py | 2 +-
python/pyfory/tests/test_reduce_serializer.py | 2 +-
python/pyfory/tests/test_serializer.py | 12 +-
python/pyfory/type.py | 20 +-
18 files changed, 950 insertions(+), 384 deletions(-)
diff --git a/AGENTS.md b/AGENTS.md
index 5d6bb2739..f903cbcca 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -389,14 +389,14 @@ Fory python has two implementations for the protocol:
Code structure:
-- `python/pyfory/_serialization.pyx`: Core serialization logic and entry point
for cython mode based on `xlang serialization format`
+- `python/pyfory/serialization.pyx`: Core serialization logic and entry point
for cython mode based on `xlang serialization format`
- `python/pyfory/_fory.py`: Serialization entry point for pure python mode
based on `xlang serialization format`
- `python/pyfory/_registry.py`: Type registry, resolution and serializer
dispatch for pure python mode, which is also used by cython mode. Cython mode
use a cache to reduce invocations to this module.
- `python/pyfory/serializer.py`: Serializers for non-internal types
- `python/pyfory/includes`: Cython headers for `c++` functions and classes.
- `python/pyfory/resolver.py`: resolving shared/circular references when ref
tracking is enabled in pure python mode
- `python/pyfory/format`: Fory row format encoding and decoding, arrow
columnar format interoperation
-- `python/pyfory/_util.pyx`: Buffer for reading/writing data, string
utilities. Used by `_serialization.pyx` and `python/pyfory/format` at the same
time.
+- `python/pyfory/_util.pyx`: Buffer for reading/writing data, string
utilities. Used by `serialization.pyx` and `python/pyfory/format` at the same
time.
#### Go
diff --git a/BUILD b/BUILD
index ded01df92..1ce2a2871 100644
--- a/BUILD
+++ b/BUILD
@@ -51,11 +51,11 @@ pyx_library(
)
pyx_library(
- name = "_serialization",
+ name = "serialization",
srcs = glob([
"python/pyfory/includes/*.pxd",
"python/pyfory/_util.pxd",
- "python/pyfory/_serialization.pyx",
+ "python/pyfory/serialization.pyx",
"python/pyfory/__init__.py",
]),
cc_kwargs = dict(
@@ -96,7 +96,7 @@ genrule(
":python/pyfory/_util.so",
":python/pyfory/lib/mmh3/mmh3.so",
":python/pyfory/format/_format.so",
- ":python/pyfory/_serialization.so",
+ ":python/pyfory/serialization.so",
],
outs = [
"cp_fory_py_generated.out",
@@ -111,12 +111,12 @@ genrule(
cp -f $(location python/pyfory/_util.so)
"$$WORK_DIR/python/pyfory/_util.pyd"
cp -f $(location python/pyfory/lib/mmh3/mmh3.so)
"$$WORK_DIR/python/pyfory/lib/mmh3/mmh3.pyd"
cp -f $(location python/pyfory/format/_format.so)
"$$WORK_DIR/python/pyfory/format/_format.pyd"
- cp -f $(location python/pyfory/_serialization.so)
"$$WORK_DIR/python/pyfory/_serialization.pyd"
+ cp -f $(location python/pyfory/serialization.so)
"$$WORK_DIR/python/pyfory/serialization.pyd"
else
cp -f $(location python/pyfory/_util.so) "$$WORK_DIR/python/pyfory"
cp -f $(location python/pyfory/lib/mmh3/mmh3.so)
"$$WORK_DIR/python/pyfory/lib/mmh3"
cp -f $(location python/pyfory/format/_format.so)
"$$WORK_DIR/python/pyfory/format"
- cp -f $(location python/pyfory/_serialization.so)
"$$WORK_DIR/python/pyfory"
+ cp -f $(location python/pyfory/serialization.so)
"$$WORK_DIR/python/pyfory"
fi
echo $$(date) > $@
""",
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 91192cbaf..01f6134a0 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -155,7 +155,7 @@ cd python
python setup.py develop
```
-- Use `cython --cplus -a pyfory/_serialization.pyx` to produce an annotated
HTML file of the source code. Then you can analyze interaction between Python
objects and Python's C API.
+- Use `cython --cplus -a pyfory/serialization.pyx` to produce an annotated
HTML file of the source code. Then you can analyze interaction between Python
objects and Python's C API.
- Read more:
https://cython.readthedocs.io/en/latest/src/userguide/debugging.html
```bash
diff --git a/python/CONTRIBUTING.md b/python/CONTRIBUTING.md
index baf925c05..86e40486a 100644
--- a/python/CONTRIBUTING.md
+++ b/python/CONTRIBUTING.md
@@ -52,7 +52,7 @@ cd python
python setup.py develop
```
-- Use `cython --cplus -a pyfory/_serialization.pyx` to produce an annotated
HTML file of the source code. Then you can
+- Use `cython --cplus -a pyfory/serialization.pyx` to produce an annotated
HTML file of the source code. Then you can
analyze interaction between Python objects and Python's C API.
- Read more:
<https://cython.readthedocs.io/en/latest/src/userguide/debugging.html>
diff --git a/python/pyfory/__init__.py b/python/pyfory/__init__.py
index 0ff316d60..2ad46266e 100644
--- a/python/pyfory/__init__.py
+++ b/python/pyfory/__init__.py
@@ -15,8 +15,8 @@
# specific language governing permissions and limitations
# under the License.
-from pyfory import lib # noqa: F401 # pylint: disable=unused-import
-from pyfory._fory import ( # noqa: F401 # pylint: disable=unused-import
+from pyfory.lib import mmh3
+from pyfory._fory import (
Fory,
Language,
ThreadSafeFory,
@@ -26,16 +26,43 @@ PYTHON = Language.PYTHON
XLANG = Language.XLANG
try:
- from pyfory._serialization import ENABLE_FORY_CYTHON_SERIALIZATION
+ from pyfory.serialization import ENABLE_FORY_CYTHON_SERIALIZATION
except ImportError:
ENABLE_FORY_CYTHON_SERIALIZATION = False
from pyfory._registry import TypeInfo
if ENABLE_FORY_CYTHON_SERIALIZATION:
- from pyfory._serialization import Fory, TypeInfo # noqa: F401,F811
+ from pyfory.serialization import Fory, TypeInfo # noqa: F401,F811
-from pyfory.serializer import * # noqa: F401,F403 # pylint:
disable=unused-import
+from pyfory.serializer import ( # noqa: F401 # pylint: disable=unused-import
+ Serializer,
+ XlangCompatibleSerializer,
+ BooleanSerializer,
+ ByteSerializer,
+ Int16Serializer,
+ Int32Serializer,
+ Int64Serializer,
+ Float32Serializer,
+ Float64Serializer,
+ StringSerializer,
+ DateSerializer,
+ TimestampSerializer,
+ CollectionSerializer,
+ ListSerializer,
+ TupleSerializer,
+ StringArraySerializer,
+ SetSerializer,
+ MapSerializer,
+ EnumSerializer,
+ SliceSerializer,
+ DataClassSerializer,
+ FunctionSerializer,
+ TypeSerializer,
+ MethodSerializer,
+ ReduceSerializer,
+ StatefulSerializer,
+)
from pyfory.type import ( # noqa: F401 # pylint: disable=unused-import
record_class_factory,
get_qualified_classname,
@@ -47,23 +74,98 @@ from pyfory.type import ( # noqa: F401 # pylint:
disable=unused-import
float32,
float64,
# Int8ArrayType,
- Int16ArrayType,
- Int32ArrayType,
- Int64ArrayType,
- Float32ArrayType,
- Float64ArrayType,
+ int16_array,
+ int32_array,
+ int64_array,
+ float32_array,
+ float64_array,
dataslots,
)
from pyfory.policy import DeserializationPolicy # noqa: F401 # pylint:
disable=unused-import
from pyfory._util import Buffer # noqa: F401 # pylint: disable=unused-import
+__version__ = "0.13.0.dev"
+
+__all__ = [
+ # Core classes
+ "Fory",
+ "Language",
+ "ThreadSafeFory",
+ "TypeInfo",
+ "Buffer",
+ "DeserializationPolicy",
+ # Language constants
+ "PYTHON",
+ "XLANG",
+ # Type utilities
+ "record_class_factory",
+ "get_qualified_classname",
+ "TypeId",
+ "int8",
+ "int16",
+ "int32",
+ "int64",
+ "float32",
+ "float64",
+ "int16_array",
+ "int32_array",
+ "int64_array",
+ "float32_array",
+ "float64_array",
+ "dataslots",
+ # Serializers
+ "Serializer",
+ "XlangCompatibleSerializer",
+ "BooleanSerializer",
+ "ByteSerializer",
+ "Int16Serializer",
+ "Int32Serializer",
+ "Int64Serializer",
+ "Float32Serializer",
+ "Float64Serializer",
+ "StringSerializer",
+ "DateSerializer",
+ "TimestampSerializer",
+ "CollectionSerializer",
+ "ListSerializer",
+ "TupleSerializer",
+ "StringArraySerializer",
+ "SetSerializer",
+ "MapSerializer",
+ "EnumSerializer",
+ "SliceSerializer",
+ "DataClassSerializer",
+ "FunctionSerializer",
+ "TypeSerializer",
+ "MethodSerializer",
+ "ReduceSerializer",
+ "StatefulSerializer",
+ "mmh3",
+ # Version
+ "__version__",
+]
+
+# Try to import format utilities (requires pyarrow)
import warnings
try:
with warnings.catch_warnings():
warnings.filterwarnings("ignore", category=RuntimeWarning)
- from pyfory.format import * # noqa: F401,F403 # pylint:
disable=unused-import
+ from pyfory.format import ( # noqa: F401 # pylint:
disable=unused-import
+ create_row_encoder,
+ RowData,
+ encoder,
+ Encoder,
+ )
+
+ __all__.extend(
+ [
+ "format",
+ "create_row_encoder",
+ "RowData",
+ "encoder",
+ "Encoder",
+ ]
+ )
except (AttributeError, ImportError):
pass
-
-__version__ = "0.13.0.dev"
diff --git a/python/pyfory/_fory.py b/python/pyfory/_fory.py
index 9398a3b78..19ac14995 100644
--- a/python/pyfory/_fory.py
+++ b/python/pyfory/_fory.py
@@ -104,6 +104,44 @@ class BufferObject(ABC):
class Fory:
+ """
+ High-performance cross-language serialization framework.
+
+ Fory provides blazingly-fast serialization for Python objects with support
for
+ both Python-native mode and cross-language mode. It handles complex object
graphs,
+ reference tracking, and circular references automatically.
+
+ In Python-native mode (xlang=False), Fory can serialize all Python objects
+ including dataclasses, classes with custom serialization methods, and local
+ functions/classes, making it a drop-in replacement for pickle.
+
+ In cross-language mode (xlang=True), Fory serializes objects in a format
that
+ can be deserialized by other Fory-supported languages (Java, Go, Rust,
C++, etc).
+
+ Examples:
+ >>> import pyfory
+ >>> from dataclasses import dataclass
+ >>>
+ >>> @dataclass
+ >>> class Person:
+ ... name: str
+ ... age: pyfory.int32
+ >>>
+ >>> # Python-native mode
+ >>> fory = pyfory.Fory()
+ >>> fory.register(Person)
+ >>> data = fory.serialize(Person("Alice", 30))
+ >>> person = fory.deserialize(data)
+ >>>
+ >>> # Cross-language mode
+ >>> fory_xlang = pyfory.Fory(xlang=True)
+ >>> fory_xlang.register(Person)
+ >>> data = fory_xlang.serialize(Person("Bob", 25))
+
+ See Also:
+ ThreadSafeFory: Thread-safe wrapper for concurrent usage
+ """
+
__slots__ = (
"language",
"is_py",
@@ -138,42 +176,49 @@ class Fory:
**kwargs,
):
"""
- :param xlang:
- Whether to enable cross-language serialization. When set to False,
enables Python-native
- serialization supporting all serializable Python objects including
dataclasses,
- structs, classes with
__getstate__/__setstate__/__reduce__/__reduce_ex__, local
- functions/classes, and classes defined in IPython. With ref=True and
strict=False,
- Fury can serve as a drop-in replacement for pickle and cloudpickle.
- When set to True, serializes objects in cross-language format that can
- be deserialized by other Fury-supported languages, but
Python-specific features
- like functions/classes/methods and custom __reduce__ methods are not
supported.
- :param ref:
- Whether to enable reference tracking for shared and circular
references.
- When enabled, duplicate objects will be stored only once and circular
references
- are supported. Disabled by default for better performance.
- :param strict:
- Whether to require registering types for serialization, enabled by
default.
- If disabled, unknown insecure types can be deserialized, which can be
- insecure and cause remote code execution attack if the types
- `__new__`/`__init__`/`__eq__`/`__hash__` method contain malicious
code, or you
- are deserializing local functions/methods/classes.
- Do not disable strict mode if you can't ensure your environment are
- *indeed secure*. We are not responsible for security risks if
- you disable this option.
- :param policy:
- A custom type policy for deserialization security check.
- If not None, it will be used to check whether a type can be
deserialized
- instead of the default type policy.
- :param compatible:
- Whether to enable compatible mode for cross-language serialization.
- When enabled, type forward/backward compatibility for dataclass
fields will be enabled.
- :param max_depth:
- The maximum depth of the deserialization data.
- If the depth exceeds the maximum depth, an exception will be raised.
- The default value is 50.
- :param field_nullable:
- Whether dataclass fields are nullable for python native
mode(xlang=False). When enabled, dataclass fields
- are always treated as nullable whether or not they are annotated with
`Optional`.
+ Initialize a Fory serialization instance.
+
+ Args:
+ xlang: Enable cross-language serialization mode. When False
(default), uses
+ Python-native mode supporting all Python objects (dataclasses,
__reduce__,
+ local functions/classes). With ref=True and strict=False,
serves as a
+ drop-in replacement for pickle. When True, uses cross-language
format
+ compatible with other Fory languages (Java, Go, Rust, etc),
but Python-
+ specific features like functions and __reduce__ methods are
not supported.
+
+ ref: Enable reference tracking for shared and circular references.
When enabled,
+ duplicate objects are stored once and circular references are
supported.
+ Disabled by default for better performance.
+
+ strict: Require type registration before serialization (default:
True). When
+ disabled, unknown types can be deserialized, which may be
insecure if
+ malicious code exists in __new__/__init__/__eq__/__hash__
methods.
+ **WARNING**: Only disable in trusted environments. When
disabling strict
+ mode, you should provide a custom `policy` parameter to
control which types
+ are allowed. We are not responsible for security risks when
this option
+ is disabled without proper policy controls.
+
+ compatible: Enable schema evolution for cross-language
serialization. When
+ enabled, supports forward/backward compatibility for dataclass
field
+ additions and removals.
+
+ max_depth: Maximum nesting depth for deserialization (default:
50). Raises
+ an exception if exceeded to prevent malicious deeply-nested
data attacks.
+
+ policy: Custom deserialization policy for security checks. When
provided,
+ it controls which types can be deserialized, overriding the
default policy.
+ **Strongly recommended** when strict=False to maintain
security controls.
+
+ field_nullable: Treat all dataclass fields as nullable in
Python-native mode
+ (xlang=False), regardless of Optional annotation. Ignored in
cross-language
+ mode.
+
+ Example:
+ >>> # Python-native mode with reference tracking
+ >>> fory = Fory(ref=True)
+ >>>
+ >>> # Cross-language mode with schema evolution
+ >>> fory = Fory(xlang=True, compatible=True)
"""
self.language = Language.XLANG if xlang else Language.PYTHON
if kwargs.get("language") is not None:
@@ -192,7 +237,7 @@ class Fory:
self.policy = policy or DEFAULT_POLICY
self.compatible = compatible
self.field_nullable = field_nullable if self.is_py else False
- from pyfory._serialization import MetaStringResolver,
SerializationContext
+ from pyfory.serialization import MetaStringResolver,
SerializationContext
from pyfory._registry import TypeResolver
self.metastring_resolver = MetaStringResolver()
@@ -218,6 +263,40 @@ class Fory:
typename: str = None,
serializer=None,
):
+ """
+ Register a type for serialization.
+
+ This is an alias for `register_type()`. Type registration enables Fory
to
+ efficiently serialize and deserialize objects by pre-computing
serialization
+ metadata.
+
+ For cross-language serialization, types can be matched between
languages using:
+ 1. **type_id** (recommended): Numeric ID matching - faster and more
compact
+ 2. **namespace + typename**: String-based matching - more flexible but
larger overhead
+
+ Args:
+ cls: The Python type to register
+ type_id: Optional unique numeric ID for cross-language type
matching.
+ Using type_id provides better performance and smaller
serialized size
+ compared to namespace/typename matching.
+ namespace: Optional namespace for cross-language type matching by
name.
+ Used when type_id is not specified.
+ typename: Optional type name for cross-language type matching by
name.
+ Defaults to class name if not specified. Used with namespace.
+ serializer: Optional custom serializer instance for this type
+
+ Example:
+ >>> # Register with type_id (recommended for performance)
+ >>> fory = Fory(xlang=True)
+ >>> fory.register(Person, type_id=100)
+ >>>
+ >>> # Register with namespace and typename (more flexible)
+ >>> fory.register(Person, namespace="com.example",
typename="Person")
+ >>>
+ >>> # Python-native mode (no cross-language matching needed)
+ >>> fory = Fory()
+ >>> fory.register(Person)
+ """
self.register_type(
cls,
type_id=type_id,
@@ -236,6 +315,39 @@ class Fory:
typename: str = None,
serializer=None,
):
+ """
+ Register a type for serialization.
+
+ Type registration enables Fory to efficiently serialize and
deserialize objects
+ by pre-computing serialization metadata.
+
+ For cross-language serialization, types can be matched between
languages using:
+ 1. **type_id** (recommended): Numeric ID matching - faster and more
compact
+ 2. **namespace + typename**: String-based matching - more flexible but
larger overhead
+
+ Args:
+ cls: The Python type to register
+ type_id: Optional unique numeric ID for cross-language type
matching.
+ Using type_id provides better performance and smaller
serialized size
+ compared to namespace/typename matching.
+ namespace: Optional namespace for cross-language type matching by
name.
+ Used when type_id is not specified.
+ typename: Optional type name for cross-language type matching by
name.
+ Defaults to class name if not specified. Used with namespace.
+ serializer: Optional custom serializer instance for this type
+
+ Example:
+ >>> # Register with type_id (recommended for performance)
+ >>> fory = Fory(xlang=True)
+ >>> fory.register_type(Person, type_id=100)
+ >>>
+ >>> # Register with namespace and typename (more flexible)
+ >>> fory.register_type(Person, namespace="com.example",
typename="Person")
+ >>>
+ >>> # Python-native mode (no cross-language matching needed)
+ >>> fory = Fory()
+ >>> fory.register_type(Person)
+ """
return self.type_resolver.register_type(
cls,
type_id=type_id,
@@ -245,6 +357,20 @@ class Fory:
)
def register_serializer(self, cls: type, serializer):
+ """
+ Register a custom serializer for a type.
+
+ Allows you to provide a custom serializer implementation for a
specific type,
+ overriding Fory's default serialization behavior.
+
+ Args:
+ cls: The Python type to associate with the serializer
+ serializer: Custom serializer instance implementing the Serializer
protocol
+
+ Example:
+ >>> fory = Fory()
+ >>> fory.register_serializer(MyClass, MyCustomSerializer())
+ """
self.type_resolver.register_serializer(cls, serializer)
def dumps(
@@ -277,6 +403,28 @@ class Fory:
buffer_callback=None,
unsupported_callback=None,
) -> Union[Buffer, bytes]:
+ """
+ Serialize a Python object to bytes.
+
+ Converts the object into Fory's binary format. The serialization
process
+ automatically handles reference tracking (if enabled), type
information,
+ and nested objects.
+
+ Args:
+ obj: The object to serialize
+ buffer: Optional pre-allocated buffer to write to. If None, uses
internal buffer
+ buffer_callback: Optional callback for out-of-band buffer
serialization
+ unsupported_callback: Optional callback for handling unsupported
types
+
+ Returns:
+ Serialized bytes if buffer is None, otherwise returns the provided
buffer
+
+ Example:
+ >>> fory = Fory()
+ >>> data = fory.serialize({"key": "value", "num": 42})
+ >>> print(type(data))
+ <class 'bytes'>
+ """
try:
return self._serialize(
obj,
@@ -294,6 +442,8 @@ class Fory:
buffer_callback=None,
unsupported_callback=None,
) -> Union[Buffer, bytes]:
+ assert self.depth == 0, "Nested serialization should use
write_ref/write_no_ref/xwrite_ref/xwrite_no_ref."
+ self.depth += 1
self._buffer_callback = buffer_callback
self._unsupported_callback = unsupported_callback
if buffer is None:
@@ -334,7 +484,7 @@ class Fory:
buffer.write_int32(-1) # Reserve 4 bytes for type definitions
offset
if self.language == Language.PYTHON:
- self.serialize_ref(buffer, obj)
+ self.write_ref(buffer, obj)
else:
self.xwrite_ref(buffer, obj)
@@ -346,14 +496,12 @@ class Fory:
current_pos = buffer.writer_index
buffer.put_int32(type_defs_offset_pos, current_pos -
type_defs_offset_pos - 4)
self.type_resolver.write_type_defs(buffer)
-
- self.reset_write()
if buffer is not self.buffer:
return buffer
else:
return buffer.to_bytes(0, buffer.writer_index)
- def serialize_ref(self, buffer, obj, typeinfo=None):
+ def write_ref(self, buffer, obj, typeinfo=None):
cls = type(obj)
if cls is str:
buffer.write_int16(NOT_NULL_STRING_FLAG)
@@ -374,7 +522,7 @@ class Fory:
self.type_resolver.write_typeinfo(buffer, typeinfo)
typeinfo.serializer.write(buffer, obj)
- def serialize_nonref(self, buffer, obj):
+ def write_no_ref(self, buffer, obj):
cls = type(obj)
if cls is str:
buffer.write_varuint32(STRING_TYPE_ID)
@@ -419,6 +567,28 @@ class Fory:
buffers: Iterable = None,
unsupported_objects: Iterable = None,
):
+ """
+ Deserialize bytes back to a Python object.
+
+ Reconstructs an object from Fory's binary format. The deserialization
process
+ automatically handles reference resolution (if enabled), type
instantiation,
+ and nested objects.
+
+ Args:
+ buffer: Serialized bytes or Buffer to deserialize from
+ buffers: Optional iterable of buffers for out-of-band
deserialization
+ unsupported_objects: Optional iterable of objects for unsupported
type handling
+
+ Returns:
+ The deserialized Python object
+
+ Example:
+ >>> fory = Fory()
+ >>> data = fory.serialize({"key": "value"})
+ >>> obj = fory.deserialize(data)
+ >>> print(obj)
+ {'key': 'value'}
+ """
try:
return self._deserialize(buffer, buffers, unsupported_objects)
finally:
@@ -430,6 +600,8 @@ class Fory:
buffers: Iterable = None,
unsupported_objects: Iterable = None,
):
+ assert self.depth == 0, "Nested deserialization should use
read_ref/read_no_ref/xread_ref/xread_no_ref."
+ self.depth += 1
if isinstance(buffer, bytes):
buffer = Buffer(buffer)
if unsupported_objects is not None:
@@ -568,6 +740,13 @@ class Fory:
return self.read_ref(buffer)
def reset_write(self):
+ """
+ Reset write state after serialization.
+
+ Clears internal write buffers, reference tracking state, and type
resolution
+ caches. This method is automatically called after each serialization.
+ """
+ self.depth = 0
self.ref_resolver.reset_write()
self.type_resolver.reset_write()
self.serialization_context.reset_write()
@@ -576,6 +755,12 @@ class Fory:
self._unsupported_callback = None
def reset_read(self):
+ """
+ Reset read state after deserialization.
+
+ Clears internal read buffers, reference tracking state, and type
resolution
+ caches. This method is automatically called after each deserialization.
+ """
self.depth = 0
self.ref_resolver.reset_read()
self.type_resolver.reset_read()
@@ -585,6 +770,13 @@ class Fory:
self._unsupported_objects = None
def reset(self):
+ """
+ Reset both write and read state.
+
+ Clears all internal state including buffers, reference tracking, and
type
+ resolution caches. Use this to ensure a clean state before reusing a
Fory
+ instance.
+ """
self.reset_write()
self.reset_read()
@@ -673,10 +865,10 @@ class ThreadSafeFory:
def _get_fory_class(self):
try:
- from pyfory._serialization import ENABLE_FORY_CYTHON_SERIALIZATION
+ from pyfory.serialization import ENABLE_FORY_CYTHON_SERIALIZATION
if ENABLE_FORY_CYTHON_SERIALIZATION:
- from pyfory._serialization import Fory as CythonFory
+ from pyfory.serialization import Fory as CythonFory
return CythonFory
except ImportError:
diff --git a/python/pyfory/_registry.py b/python/pyfory/_registry.py
index 8a0d7d805..3ecb90f61 100644
--- a/python/pyfory/_registry.py
+++ b/python/pyfory/_registry.py
@@ -101,7 +101,7 @@ namespace_decoder = MetaStringDecoder(".", "_")
typename_decoder = MetaStringDecoder("$", "_")
if ENABLE_FORY_CYTHON_SERIALIZATION:
- from pyfory._serialization import TypeInfo
+ from pyfory.serialization import TypeInfo
else:
class TypeInfo:
diff --git a/python/pyfory/_serializer.py b/python/pyfory/_serializer.py
index 5950e1ece..17c9d9ed2 100644
--- a/python/pyfory/_serializer.py
+++ b/python/pyfory/_serializer.py
@@ -85,7 +85,7 @@ class Serializer(ABC):
return False
-class CrossLanguageCompatibleSerializer(Serializer):
+class XlangCompatibleSerializer(Serializer):
def __init__(self, fory, type_):
super().__init__(fory, type_)
@@ -96,7 +96,7 @@ class CrossLanguageCompatibleSerializer(Serializer):
return self.read(buffer)
-class BooleanSerializer(CrossLanguageCompatibleSerializer):
+class BooleanSerializer(XlangCompatibleSerializer):
def write(self, buffer, value):
buffer.write_bool(value)
@@ -104,7 +104,7 @@ class BooleanSerializer(CrossLanguageCompatibleSerializer):
return buffer.read_bool()
-class ByteSerializer(CrossLanguageCompatibleSerializer):
+class ByteSerializer(XlangCompatibleSerializer):
def write(self, buffer, value):
buffer.write_int8(value)
@@ -112,7 +112,7 @@ class ByteSerializer(CrossLanguageCompatibleSerializer):
return buffer.read_int8()
-class Int16Serializer(CrossLanguageCompatibleSerializer):
+class Int16Serializer(XlangCompatibleSerializer):
def write(self, buffer, value):
buffer.write_int16(value)
@@ -120,7 +120,7 @@ class Int16Serializer(CrossLanguageCompatibleSerializer):
return buffer.read_int16()
-class Int32Serializer(CrossLanguageCompatibleSerializer):
+class Int32Serializer(XlangCompatibleSerializer):
def write(self, buffer, value):
buffer.write_varint32(value)
@@ -142,7 +142,7 @@ class Int64Serializer(Serializer):
return buffer.read_varint64()
-class Float32Serializer(CrossLanguageCompatibleSerializer):
+class Float32Serializer(XlangCompatibleSerializer):
def write(self, buffer, value):
buffer.write_float(value)
@@ -150,7 +150,7 @@ class Float32Serializer(CrossLanguageCompatibleSerializer):
return buffer.read_float()
-class Float64Serializer(CrossLanguageCompatibleSerializer):
+class Float64Serializer(XlangCompatibleSerializer):
def write(self, buffer, value):
buffer.write_double(value)
@@ -158,7 +158,7 @@ class Float64Serializer(CrossLanguageCompatibleSerializer):
return buffer.read_double()
-class StringSerializer(CrossLanguageCompatibleSerializer):
+class StringSerializer(XlangCompatibleSerializer):
def __init__(self, fory, type_):
super().__init__(fory, type_)
self.need_to_write_ref = False
@@ -173,7 +173,7 @@ class StringSerializer(CrossLanguageCompatibleSerializer):
_base_date = datetime.date(1970, 1, 1)
-class DateSerializer(CrossLanguageCompatibleSerializer):
+class DateSerializer(XlangCompatibleSerializer):
def write(self, buffer, value: datetime.date):
if not isinstance(value, datetime.date):
raise TypeError("{} should be {} instead of {}".format(value,
datetime.date, type(value)))
@@ -185,7 +185,7 @@ class DateSerializer(CrossLanguageCompatibleSerializer):
return _base_date + datetime.timedelta(days=days)
-class TimestampSerializer(CrossLanguageCompatibleSerializer):
+class TimestampSerializer(XlangCompatibleSerializer):
__win_platform = platform.system() == "Windows"
def _get_timestamp(self, value: datetime.datetime):
@@ -464,7 +464,7 @@ class MapSerializer(Serializer):
items_iter = iter(obj.items())
key, value = next(items_iter)
has_next = True
- serialize_ref = fory.serialize_ref if self.fory.is_py else
fory.xwrite_ref
+ write_ref = fory.write_ref if self.fory.is_py else fory.xwrite_ref
while has_next:
while True:
if key is not None:
@@ -480,7 +480,7 @@ class MapSerializer(Serializer):
self._write_obj(key_serializer, buffer, key)
else:
buffer.write_int8(VALUE_HAS_NULL | TRACKING_KEY_REF)
- serialize_ref(buffer, key)
+ write_ref(buffer, key)
else:
if value is not None:
if value_serializer is not None:
@@ -495,7 +495,7 @@ class MapSerializer(Serializer):
value_serializer.write(buffer, value)
else:
buffer.write_int8(KEY_HAS_NULL |
TRACKING_VALUE_REF)
- serialize_ref(buffer, value)
+ write_ref(buffer, value)
else:
buffer.write_int8(KV_NULL)
try:
@@ -707,7 +707,7 @@ class SliceSerializer(Serializer):
buffer.write_int8(NULL_FLAG)
else:
buffer.write_int8(NOT_NULL_VALUE_FLAG)
- self.fory.serialize_nonref(buffer, start)
+ self.fory.write_no_ref(buffer, start)
if type(stop) is int:
# TODO support varint128
buffer.write_int16(NOT_NULL_INT64_FLAG)
@@ -717,7 +717,7 @@ class SliceSerializer(Serializer):
buffer.write_int8(NULL_FLAG)
else:
buffer.write_int8(NOT_NULL_VALUE_FLAG)
- self.fory.serialize_nonref(buffer, stop)
+ self.fory.write_no_ref(buffer, stop)
if type(step) is int:
# TODO support varint128
buffer.write_int16(NOT_NULL_INT64_FLAG)
@@ -727,7 +727,7 @@ class SliceSerializer(Serializer):
buffer.write_int8(NULL_FLAG)
else:
buffer.write_int8(NOT_NULL_VALUE_FLAG)
- self.fory.serialize_nonref(buffer, step)
+ self.fory.write_no_ref(buffer, step)
def read(self, buffer):
if buffer.read_int8() == NULL_FLAG:
diff --git a/python/pyfory/codegen.py b/python/pyfory/codegen.py
index dfabe00ce..f587464db 100644
--- a/python/pyfory/codegen.py
+++ b/python/pyfory/codegen.py
@@ -110,16 +110,16 @@ def compile_function(
from pyfory import ENABLE_FORY_CYTHON_SERIALIZATION
if ENABLE_FORY_CYTHON_SERIALIZATION:
- from pyfory import _serialization
-
- context["write_nullable_pybool"] = _serialization.write_nullable_pybool
- context["read_nullable_pybool"] = _serialization.read_nullable_pybool
- context["write_nullable_pyint64"] =
_serialization.write_nullable_pyint64
- context["read_nullable_pyint64"] = _serialization.read_nullable_pyint64
- context["write_nullable_pyfloat64"] =
_serialization.write_nullable_pyfloat64
- context["read_nullable_pyfloat64"] =
_serialization.read_nullable_pyfloat64
- context["write_nullable_pystr"] = _serialization.write_nullable_pystr
- context["read_nullable_pystr"] = _serialization.read_nullable_pystr
+ from pyfory import serialization
+
+ context["write_nullable_pybool"] = serialization.write_nullable_pybool
+ context["read_nullable_pybool"] = serialization.read_nullable_pybool
+ context["write_nullable_pyint64"] =
serialization.write_nullable_pyint64
+ context["read_nullable_pyint64"] = serialization.read_nullable_pyint64
+ context["write_nullable_pyfloat64"] =
serialization.write_nullable_pyfloat64
+ context["read_nullable_pyfloat64"] =
serialization.read_nullable_pyfloat64
+ context["write_nullable_pystr"] = serialization.write_nullable_pystr
+ context["read_nullable_pystr"] = serialization.read_nullable_pystr
stmts = [f"{ident(statement)}" for statement in stmts]
# Sanitize the function name to ensure it is valid Python syntax
sanitized_function_name = _sanitize_function_name(function_name)
diff --git a/python/pyfory/format/serializer.py
b/python/pyfory/format/serializer.py
index f176e5756..aaaf1519f 100644
--- a/python/pyfory/format/serializer.py
+++ b/python/pyfory/format/serializer.py
@@ -16,11 +16,11 @@
# under the License.
import pyarrow as pa
-from pyfory.serializer import CrossLanguageCompatibleSerializer, BufferObject
+from pyfory.serializer import XlangCompatibleSerializer, BufferObject
from pyfory.buffer import Buffer
-class ArrowRecordBatchSerializer(CrossLanguageCompatibleSerializer):
+class ArrowRecordBatchSerializer(XlangCompatibleSerializer):
def write(self, buffer, value: pa.RecordBatch):
self.fory.write_buffer_object(buffer,
ArrowRecordBatchBufferObject(value))
@@ -69,7 +69,7 @@ class ArrowRecordBatchBufferObject(BufferObject):
stream_writer.close()
-class ArrowTableSerializer(CrossLanguageCompatibleSerializer):
+class ArrowTableSerializer(XlangCompatibleSerializer):
def write(self, buffer, value: pa.Table):
self.fory.write_buffer_object(buffer, ArrowTableBufferObject(value))
diff --git a/python/pyfory/format/tests/test_vectorized.py
b/python/pyfory/format/tests/test_vectorized.py
index 560fe5ceb..a0ebea1ff 100644
--- a/python/pyfory/format/tests/test_vectorized.py
+++ b/python/pyfory/format/tests/test_vectorized.py
@@ -37,7 +37,7 @@ def test_vectorized():
],
metadata={"cls": fory.get_qualified_classname(cls)},
)
- writer = fory.ArrowWriter(schema)
+ writer = fory.format.ArrowWriter(schema)
encoder = fory.create_row_encoder(schema)
num_rows = 10
data = [[] for _ in range(len(field_names))]
@@ -73,7 +73,7 @@ def test_vectorized_map():
metadata={"cls": fory.get_qualified_classname(cls)},
)
print(schema)
- writer = fory.ArrowWriter(schema)
+ writer = fory.format.ArrowWriter(schema)
encoder = fory.create_row_encoder(schema)
num_rows = 5
data = []
diff --git a/python/pyfory/_serialization.pyx b/python/pyfory/serialization.pyx
similarity index 85%
rename from python/pyfory/_serialization.pyx
rename to python/pyfory/serialization.pyx
index 9f246ff08..6026ef348 100644
--- a/python/pyfory/_serialization.pyx
+++ b/python/pyfory/serialization.pyx
@@ -92,6 +92,22 @@ typename_decoder = MetaStringDecoder("$", "_")
@cython.final
cdef class MapRefResolver:
+ """
+ Manages object reference tracking during serialization and deserialization.
+
+ Handles shared and circular references by assigning unique IDs to objects
+ during serialization and resolving them during deserialization. This
enables
+ efficient serialization of object graphs with duplicate references and
prevents
+ infinite recursion with circular references.
+
+ When ref_tracking is enabled, duplicate object references are serialized
only once,
+ with subsequent references storing only the reference ID. During
deserialization,
+ the resolver maintains a mapping to reconstruct the exact same object
graph structure.
+
+ Note:
+ This is an internal class used by the Fory serializer. Users typically
don't
+ interact with this class directly.
+ """
cdef flat_hash_map[uint64_t, int32_t] written_objects_id # id(obj) ->
ref_id
# Hold object to avoid tmp object gc when serialize nested fields/objects.
cdef vector[PyObject *] written_objects
@@ -454,6 +470,25 @@ cdef class TypeInfo:
@cython.final
cdef class TypeResolver:
+ """
+ Manages type registration, resolution, and serializer dispatch.
+
+ TypeResolver maintains mappings between Python types and their
corresponding
+ serialization metadata (TypeInfo), including serializers, type IDs, and
cross-
+ language type names. It handles both registered types (with explicit type
IDs)
+ and dynamic types (resolved at runtime).
+
+ For cross-language serialization, TypeResolver coordinates namespace and
typename
+ encoding using MetaString compression, and manages type definition sharing
when
+ compatible mode is enabled.
+
+ The resolver uses high-performance C++ hash maps for fast type lookups
during
+ serialization and deserialization.
+
+ Note:
+ This is an internal class used by the Fory serializer. Users typically
don't
+ interact with this class directly, but instead use Fory.register()
methods.
+ """
cdef:
readonly Fory fory
readonly MetaStringResolver metastring_resolver
@@ -666,10 +701,19 @@ cdef class TypeResolver:
@cython.final
cdef class MetaContext:
"""
- Context for sharing type meta across multiple serialization. Type name,
field name and field
- type will be shared between different serialization.
+ Manages type metadata sharing across serializations in compatible mode.
+
+ When compatible mode is enabled, MetaContext tracks type definitions (type
names,
+ field names, field types) to enable efficient schema evolution. Instead of
sending
+ full type metadata with every serialized object, the context sends type
definitions
+ once and references them by ID in subsequent serializations.
+
+ This enables forward/backward compatibility when struct fields are added
or removed
+ between different versions of an application.
- Note that this context is not thread-safe, you should use it with one Fory
instance.
+ Note:
+ This is an internal class used by SerializationContext. It is not
thread-safe
+ and should only be used with a single Fory instance.
"""
cdef:
# Types which have sent definitions to peer
@@ -752,8 +796,19 @@ cdef class MetaContext:
@cython.final
cdef class SerializationContext:
"""
- Context for sharing data across multiple serialization.
- Note that this context is not thread-safe, you should use it with one Fory
instance.
+ Manages serialization state and metadata sharing across operations.
+
+ SerializationContext provides a scoped storage for sharing data during
serialization
+ and deserialization operations. When compatible mode is enabled, it
maintains a
+ MetaContext for efficient type metadata sharing to support schema
evolution.
+
+ The context stores temporary objects needed during serialization (e.g.,
class
+ definitions, custom serialization state) and coordinates type definition
exchange
+ between serializer and deserializer.
+
+ Note:
+ This is an internal class used by the Fory serializer. It is not
thread-safe
+ and should only be used with a single Fory instance.
"""
cdef dict objects
cdef readonly bint scoped_meta_share_enabled
@@ -800,6 +855,43 @@ cdef class SerializationContext:
@cython.final
cdef class Fory:
+ """
+ High-performance cross-language serialization framework.
+
+ Fory provides blazingly-fast serialization for Python objects with support
for
+ both Python-native mode and cross-language mode. It handles complex object
graphs,
+ reference tracking, and circular references automatically.
+
+ In Python-native mode (xlang=False), Fory can serialize all Python objects
+ including dataclasses, classes with custom serialization methods, and local
+ functions/classes, making it a drop-in replacement for pickle.
+
+ In cross-language mode (xlang=True), Fory serializes objects in a format
that
+ can be deserialized by other Fory-supported languages (Java, Go, Rust,
C++, etc).
+
+ Examples:
+ >>> import pyfory
+ >>> from dataclasses import dataclass
+ >>>
+ >>> @dataclass
+ >>> class Person:
+ ... name: str
+ ... age: pyfory.int32
+ >>>
+ >>> # Python-native mode
+ >>> fory = pyfory.Fory()
+ >>> fory.register(Person)
+ >>> data = fory.serialize(Person("Alice", 30))
+ >>> person = fory.deserialize(data)
+ >>>
+ >>> # Cross-language mode
+ >>> fory_xlang = pyfory.Fory(xlang=True)
+ >>> fory_xlang.register(Person)
+ >>> data = fory_xlang.serialize(Person("Bob", 25))
+
+ See Also:
+ ThreadSafeFory: Thread-safe wrapper for concurrent usage
+ """
cdef readonly object language
cdef readonly c_bool ref_tracking
cdef readonly c_bool strict
@@ -832,42 +924,49 @@ cdef class Fory:
**kwargs,
):
"""
- :param xlang:
- Whether to enable cross-language serialization. When set to False,
enables Python-native
- serialization supporting all serializable Python objects including
dataclasses,
- structs, classes with
__getstate__/__setstate__/__reduce__/__reduce_ex__, local
- functions/classes, and classes defined in IPython. With ref=True and
strict=False,
- Fury can serve as a drop-in replacement for pickle and cloudpickle.
- When set to True, serializes objects in cross-language format that can
- be deserialized by other Fury-supported languages, but
Python-specific features
- like functions/classes/methods and custom __reduce__ methods are not
supported.
- :param ref:
- Whether to enable reference tracking for shared and circular
references.
- When enabled, duplicate objects will be stored only once and circular
references
- are supported. Disabled by default for better performance.
- :param strict:
- Whether to require registering types for serialization, enabled by
default.
- If disabled, unknown insecure types can be deserialized, which can be
- insecure and cause remote code execution attack if the types
- `__new__`/`__init__`/`__eq__`/`__hash__` method contain malicious
code, or you
- are deserializing local functions/methods/classes.
- Do not disable strict mode if you can't ensure your environment are
- *indeed secure*. We are not responsible for security risks if
- you disable this option.
- :param policy:
- A custom type policy for deserialization security check.
- If not None, it will be used to check whether a type can be
deserialized
- instead of the default type policy.
- :param compatible:
- Whether to enable compatible mode for cross-language serialization.
- When enabled, type forward/backward compatibility for struct fields
will be enabled.
- :param max_depth:
- The maximum depth of the deserialization data.
- If the depth exceeds the maximum depth, an exception will be raised.
- The default value is 50.
- :param field_nullable:
- Whether dataclass fields are nullable for python native
mode(xlang=False). When enabled, dataclass fields
- are always treated as nullable whether or not they are annotated with
`Optional`.
+ Initialize a Fory serialization instance.
+
+ Args:
+ xlang: Enable cross-language serialization mode. When False
(default), uses
+ Python-native mode supporting all Python objects (dataclasses,
__reduce__,
+ local functions/classes). With ref=True and strict=False,
serves as a
+ drop-in replacement for pickle. When True, uses cross-language
format
+ compatible with other Fory languages (Java, Go, Rust, etc),
but Python-
+ specific features like functions and __reduce__ methods are
not supported.
+
+ ref: Enable reference tracking for shared and circular references.
When enabled,
+ duplicate objects are stored once and circular references are
supported.
+ Disabled by default for better performance.
+
+ strict: Require type registration before serialization (default:
True). When
+ disabled, unknown types can be deserialized, which may be
insecure if
+ malicious code exists in __new__/__init__/__eq__/__hash__
methods.
+ **WARNING**: Only disable in trusted environments. When
disabling strict
+ mode, you should provide a custom `policy` parameter to
control which types
+ are allowed. We are not responsible for security risks when
this option
+ is disabled without proper policy controls.
+
+ compatible: Enable schema evolution for cross-language
serialization. When
+ enabled, supports forward/backward compatibility for struct
field
+ additions and removals.
+
+ max_depth: Maximum nesting depth for deserialization (default:
50). Raises
+ an exception if exceeded to prevent malicious deeply-nested
data attacks.
+
+ policy: Custom deserialization policy for security checks. When
provided,
+ it controls which types can be deserialized, overriding the
default policy.
+ **Strongly recommended** when strict=False to maintain
security controls.
+
+ field_nullable: Treat all dataclass fields as nullable in
Python-native mode
+ (xlang=False), regardless of Optional annotation. Ignored in
cross-language
+ mode.
+
+ Example:
+ >>> # Python-native mode with reference tracking
+ >>> fory = Fory(ref=True)
+ >>>
+ >>> # Cross-language mode with schema evolution
+ >>> fory = Fory(xlang=True, compatible=True)
"""
self.language = Language.XLANG if xlang else Language.PYTHON
if kwargs.get("language") is not None:
@@ -900,6 +999,20 @@ cdef class Fory:
self.max_depth = max_depth
def register_serializer(self, cls: Union[type, TypeVar], Serializer
serializer):
+ """
+ Register a custom serializer for a type.
+
+ Allows you to provide a custom serializer implementation for a
specific type,
+ overriding Fory's default serialization behavior.
+
+ Args:
+ cls: The Python type to associate with the serializer
+ serializer: Custom serializer instance implementing the Serializer
protocol
+
+ Example:
+ >>> fory = Fory()
+ >>> fory.register_serializer(MyClass, MyCustomSerializer())
+ """
self.type_resolver.register_serializer(cls, serializer)
def register(
@@ -911,6 +1024,40 @@ cdef class Fory:
typename: str = None,
serializer=None,
):
+ """
+ Register a type for serialization.
+
+ This is an alias for `register_type()`. Type registration enables Fory
to
+ efficiently serialize and deserialize objects by pre-computing
serialization
+ metadata.
+
+ For cross-language serialization, types can be matched between
languages using:
+ 1. **type_id** (recommended): Numeric ID matching - faster and more
compact
+ 2. **namespace + typename**: String-based matching - more flexible but
larger overhead
+
+ Args:
+ cls: The Python type to register
+ type_id: Optional unique numeric ID for cross-language type
matching.
+ Using type_id provides better performance and smaller
serialized size
+ compared to namespace/typename matching.
+ namespace: Optional namespace for cross-language type matching by
name.
+ Used when type_id is not specified.
+ typename: Optional type name for cross-language type matching by
name.
+ Defaults to class name if not specified. Used with namespace.
+ serializer: Optional custom serializer instance for this type
+
+ Example:
+ >>> # Register with type_id (recommended for performance)
+ >>> fory = Fory(xlang=True)
+ >>> fory.register(Person, type_id=100)
+ >>>
+ >>> # Register with namespace and typename (more flexible)
+ >>> fory.register(Person, namespace="com.example",
typename="Person")
+ >>>
+ >>> # Python-native mode (no cross-language matching needed)
+ >>> fory = Fory()
+ >>> fory.register(Person)
+ """
self.type_resolver.register_type(
cls, type_id=type_id, namespace=namespace, typename=typename,
serializer=serializer)
@@ -923,6 +1070,39 @@ cdef class Fory:
typename: str = None,
serializer=None,
):
+ """
+ Register a type for serialization.
+
+ Type registration enables Fory to efficiently serialize and
deserialize objects
+ by pre-computing serialization metadata.
+
+ For cross-language serialization, types can be matched between
languages using:
+ 1. **type_id** (recommended): Numeric ID matching - faster and more
compact
+ 2. **namespace + typename**: String-based matching - more flexible but
larger overhead
+
+ Args:
+ cls: The Python type to register
+ type_id: Optional unique numeric ID for cross-language type
matching.
+ Using type_id provides better performance and smaller
serialized size
+ compared to namespace/typename matching.
+ namespace: Optional namespace for cross-language type matching by
name.
+ Used when type_id is not specified.
+ typename: Optional type name for cross-language type matching by
name.
+ Defaults to class name if not specified. Used with namespace.
+ serializer: Optional custom serializer instance for this type
+
+ Example:
+ >>> # Register with type_id (recommended for performance)
+ >>> fory = Fory(xlang=True)
+ >>> fory.register_type(Person, type_id=100)
+ >>>
+ >>> # Register with namespace and typename (more flexible)
+ >>> fory.register_type(Person, namespace="com.example",
typename="Person")
+ >>>
+ >>> # Python-native mode (no cross-language matching needed)
+ >>> fory = Fory()
+ >>> fory.register_type(Person)
+ """
self.type_resolver.register_type(
cls, type_id=type_id, namespace=namespace, typename=typename,
serializer=serializer)
@@ -955,6 +1135,28 @@ cdef class Fory:
buffer_callback=None,
unsupported_callback=None
) -> Union[Buffer, bytes]:
+ """
+ Serialize a Python object to bytes.
+
+ Converts the object into Fory's binary format. The serialization
process
+ automatically handles reference tracking (if enabled), type
information,
+ and nested objects.
+
+ Args:
+ obj: The object to serialize
+ buffer: Optional pre-allocated buffer to write to. If None, uses
internal buffer
+ buffer_callback: Optional callback for out-of-band buffer
serialization
+ unsupported_callback: Optional callback for handling unsupported
types
+
+ Returns:
+ Serialized bytes if buffer is None, otherwise returns the provided
buffer
+
+ Example:
+ >>> fory = Fory()
+ >>> data = fory.serialize({"key": "value", "num": 42})
+ >>> print(type(data))
+ <class 'bytes'>
+ """
try:
return self._serialize(
obj,
@@ -966,6 +1168,8 @@ cdef class Fory:
cpdef inline _serialize(
self, obj, Buffer buffer, buffer_callback=None,
unsupported_callback=None):
+ assert self.depth == 0, "Nested serialization should use
write_ref/write_no_ref/xwrite_ref/xwrite_no_ref."
+ self.depth += 1
self.buffer_callback = buffer_callback
self._unsupported_callback = unsupported_callback
if buffer is None:
@@ -1007,7 +1211,7 @@ cdef class Fory:
cdef int32_t start_offset
if self.language == Language.PYTHON:
- self.serialize_ref(buffer, obj)
+ self.write_ref(buffer, obj)
else:
self.xwrite_ref(buffer, obj)
@@ -1025,7 +1229,7 @@ cdef class Fory:
else:
return buffer.to_bytes(0, buffer.writer_index)
- cpdef inline serialize_ref(
+ cpdef inline write_ref(
self, Buffer buffer, obj, TypeInfo typeinfo=None):
cls = type(obj)
if cls is str:
@@ -1051,7 +1255,7 @@ cdef class Fory:
self.type_resolver.write_typeinfo(buffer, typeinfo)
typeinfo.serializer.write(buffer, obj)
- cpdef inline serialize_nonref(self, Buffer buffer, obj):
+ cpdef inline write_no_ref(self, Buffer buffer, obj):
cls = type(obj)
if cls is str:
buffer.write_varuint32(STRING_TYPE_ID)
@@ -1103,6 +1307,28 @@ cdef class Fory:
buffers: Iterable = None,
unsupported_objects: Iterable = None,
):
+ """
+ Deserialize bytes back to a Python object.
+
+ Reconstructs an object from Fory's binary format. The deserialization
process
+ automatically handles reference resolution (if enabled), type
instantiation,
+ and nested objects.
+
+ Args:
+ buffer: Serialized bytes or Buffer to deserialize from
+ buffers: Optional iterable of buffers for out-of-band
deserialization
+ unsupported_objects: Optional iterable of objects for unsupported
type handling
+
+ Returns:
+ The deserialized Python object
+
+ Example:
+ >>> fory = Fory()
+ >>> data = fory.serialize({"key": "value"})
+ >>> obj = fory.deserialize(data)
+ >>> print(obj)
+ {'key': 'value'}
+ """
try:
if type(buffer) == bytes:
buffer = Buffer(buffer)
@@ -1112,6 +1338,8 @@ cdef class Fory:
cpdef inline _deserialize(
self, Buffer buffer, buffers=None, unsupported_objects=None):
+ assert self.depth == 0, "Nested deserialization should use
read_ref/read_no_ref/xread_ref/xread_no_ref."
+ self.depth += 1
if unsupported_objects is not None:
self._unsupported_objects = iter(unsupported_objects)
if self.language == Language.XLANG:
@@ -1308,6 +1536,13 @@ cdef class Fory:
return o
cpdef inline reset_write(self):
+ """
+ Reset write state after serialization.
+
+ Clears internal write buffers, reference tracking state, and type
resolution
+ caches. This method is automatically called after each serialization.
+ """
+ self.depth = 0
self.ref_resolver.reset_write()
self.type_resolver.reset_write()
self.metastring_resolver.reset_write()
@@ -1315,6 +1550,12 @@ cdef class Fory:
self._unsupported_callback = None
cpdef inline reset_read(self):
+ """
+ Reset read state after deserialization.
+
+ Clears internal read buffers, reference tracking state, and type
resolution
+ caches. This method is automatically called after each deserialization.
+ """
self.depth = 0
self.ref_resolver.reset_read()
self.type_resolver.reset_read()
@@ -1324,6 +1565,13 @@ cdef class Fory:
self._unsupported_objects = None
cpdef inline reset(self):
+ """
+ Reset both write and read state.
+
+ Clears all internal state including buffers, reference tracking, and
type
+ resolution caches. Use this to ensure a clean state before reusing a
Fory
+ instance.
+ """
self.reset_write()
self.reset_read()
@@ -1381,6 +1629,30 @@ cpdef inline read_nullable_pystr(Buffer buffer):
cdef class Serializer:
+ """
+ Base class for type-specific serializers.
+
+ Serializer defines the interface for serializing and deserializing objects
of a
+ specific type. Each serializer implements two modes:
+
+ - Python-native mode (write/read): Optimized for Python-to-Python
serialization,
+ supporting all Python-specific features like __reduce__, local
functions, etc.
+
+ - Cross-language mode (xwrite/xread): Serializes to a cross-language format
+ compatible with other Fory implementations (Java, Go, Rust, C++, etc).
+
+ Custom serializers can be registered for user-defined types using
+ Fory.register_serializer() to override default serialization behavior.
+
+ Attributes:
+ fory: The Fory instance this serializer belongs to
+ type_: The Python type this serializer handles
+ need_to_write_ref: Whether reference tracking is needed for this type
+
+ Note:
+ This is a base class for implementing custom serializers. Subclasses
must
+ implement write(), read(), xwrite(), and xread() methods.
+ """
cdef readonly Fory fory
cdef readonly object type_
cdef public c_bool need_to_write_ref
@@ -1406,7 +1678,7 @@ cdef class Serializer:
def support_subclass(cls) -> bool:
return False
-cdef class CrossLanguageCompatibleSerializer(Serializer):
+cdef class XlangCompatibleSerializer(Serializer):
cpdef xwrite(self, Buffer buffer, value):
self.write(buffer, value)
@@ -1415,7 +1687,7 @@ cdef class CrossLanguageCompatibleSerializer(Serializer):
@cython.final
-cdef class BooleanSerializer(CrossLanguageCompatibleSerializer):
+cdef class BooleanSerializer(XlangCompatibleSerializer):
cpdef inline write(self, Buffer buffer, value):
buffer.write_bool(value)
@@ -1424,7 +1696,7 @@ cdef class
BooleanSerializer(CrossLanguageCompatibleSerializer):
@cython.final
-cdef class ByteSerializer(CrossLanguageCompatibleSerializer):
+cdef class ByteSerializer(XlangCompatibleSerializer):
cpdef inline write(self, Buffer buffer, value):
buffer.write_int8(value)
@@ -1433,7 +1705,7 @@ cdef class
ByteSerializer(CrossLanguageCompatibleSerializer):
@cython.final
-cdef class Int16Serializer(CrossLanguageCompatibleSerializer):
+cdef class Int16Serializer(XlangCompatibleSerializer):
cpdef inline write(self, Buffer buffer, value):
buffer.write_int16(value)
@@ -1442,7 +1714,7 @@ cdef class
Int16Serializer(CrossLanguageCompatibleSerializer):
@cython.final
-cdef class Int32Serializer(CrossLanguageCompatibleSerializer):
+cdef class Int32Serializer(XlangCompatibleSerializer):
cpdef inline write(self, Buffer buffer, value):
buffer.write_varint32(value)
@@ -1451,7 +1723,7 @@ cdef class
Int32Serializer(CrossLanguageCompatibleSerializer):
@cython.final
-cdef class Int64Serializer(CrossLanguageCompatibleSerializer):
+cdef class Int64Serializer(XlangCompatibleSerializer):
cpdef inline xwrite(self, Buffer buffer, value):
buffer.write_varint64(value)
@@ -1475,7 +1747,7 @@ cdef float FLOAT32_MAX_VALUE = 3.40282e+38
@cython.final
-cdef class Float32Serializer(CrossLanguageCompatibleSerializer):
+cdef class Float32Serializer(XlangCompatibleSerializer):
cpdef inline write(self, Buffer buffer, value):
buffer.write_float(value)
@@ -1484,7 +1756,7 @@ cdef class
Float32Serializer(CrossLanguageCompatibleSerializer):
@cython.final
-cdef class Float64Serializer(CrossLanguageCompatibleSerializer):
+cdef class Float64Serializer(XlangCompatibleSerializer):
cpdef inline write(self, Buffer buffer, value):
buffer.write_double(value)
@@ -1493,7 +1765,7 @@ cdef class
Float64Serializer(CrossLanguageCompatibleSerializer):
@cython.final
-cdef class StringSerializer(CrossLanguageCompatibleSerializer):
+cdef class StringSerializer(XlangCompatibleSerializer):
def __init__(self, fory, type_, track_ref=False):
super().__init__(fory, type_)
self.need_to_write_ref = track_ref
@@ -1509,7 +1781,7 @@ cdef _base_date = datetime.date(1970, 1, 1)
@cython.final
-cdef class DateSerializer(CrossLanguageCompatibleSerializer):
+cdef class DateSerializer(XlangCompatibleSerializer):
cpdef inline write(self, Buffer buffer, value):
if type(value) is not datetime.date:
raise TypeError(
@@ -1526,7 +1798,7 @@ cdef class
DateSerializer(CrossLanguageCompatibleSerializer):
@cython.final
-cdef class TimestampSerializer(CrossLanguageCompatibleSerializer):
+cdef class TimestampSerializer(XlangCompatibleSerializer):
cdef bint win_platform
def __init__(self, fory, type_: Union[type, TypeVar]):
@@ -2099,7 +2371,7 @@ cdef class MapSerializer(Serializer):
else:
buffer.write_int8(VALUE_HAS_NULL | TRACKING_KEY_REF)
if is_py:
- fory.serialize_ref(buffer, key)
+ fory.write_ref(buffer, key)
else:
fory.xwrite_ref(buffer, key)
else:
@@ -2126,7 +2398,7 @@ cdef class MapSerializer(Serializer):
else:
buffer.write_int8(KEY_HAS_NULL |
TRACKING_VALUE_REF)
if is_py:
- fory.serialize_ref(buffer, value)
+ fory.write_ref(buffer, value)
else:
fory.xwrite_ref(buffer, value)
else:
@@ -2411,7 +2683,7 @@ cdef class SliceSerializer(Serializer):
buffer.write_int8(NULL_FLAG)
else:
buffer.write_int8(NOT_NULL_VALUE_FLAG)
- self.fory.serialize_nonref(buffer, start)
+ self.fory.write_no_ref(buffer, start)
if type(stop) is int:
# TODO support varint128
buffer.write_int16(NOT_NULL_INT64_FLAG)
@@ -2421,7 +2693,7 @@ cdef class SliceSerializer(Serializer):
buffer.write_int8(NULL_FLAG)
else:
buffer.write_int8(NOT_NULL_VALUE_FLAG)
- self.fory.serialize_nonref(buffer, stop)
+ self.fory.write_no_ref(buffer, stop)
if type(step) is int:
# TODO support varint128
buffer.write_int16(NOT_NULL_INT64_FLAG)
@@ -2431,7 +2703,7 @@ cdef class SliceSerializer(Serializer):
buffer.write_int8(NULL_FLAG)
else:
buffer.write_int8(NOT_NULL_VALUE_FLAG)
- self.fory.serialize_nonref(buffer, step)
+ self.fory.write_no_ref(buffer, step)
cpdef inline read(self, Buffer buffer):
if buffer.read_int8() == NULL_FLAG:
diff --git a/python/pyfory/serializer.py b/python/pyfory/serializer.py
index 14f4175bc..5eee1ea41 100644
--- a/python/pyfory/serializer.py
+++ b/python/pyfory/serializer.py
@@ -53,12 +53,12 @@ from pyfory._fory import (
_WINDOWS = os.name == "nt"
-from pyfory._serialization import ENABLE_FORY_CYTHON_SERIALIZATION
+from pyfory.serialization import ENABLE_FORY_CYTHON_SERIALIZATION
if ENABLE_FORY_CYTHON_SERIALIZATION:
- from pyfory._serialization import ( # noqa: F401, F811
+ from pyfory.serialization import ( # noqa: F401, F811
Serializer,
- CrossLanguageCompatibleSerializer,
+ XlangCompatibleSerializer,
BooleanSerializer,
ByteSerializer,
Int16Serializer,
@@ -81,7 +81,7 @@ if ENABLE_FORY_CYTHON_SERIALIZATION:
else:
from pyfory._serializer import ( # noqa: F401 # pylint:
disable=unused-import
Serializer,
- CrossLanguageCompatibleSerializer,
+ XlangCompatibleSerializer,
BooleanSerializer,
ByteSerializer,
Int16Serializer,
@@ -103,11 +103,11 @@ else:
)
from pyfory.type import (
- Int16ArrayType,
- Int32ArrayType,
- Int64ArrayType,
- Float32ArrayType,
- Float64ArrayType,
+ int16_array,
+ int32_array,
+ int64_array,
+ float32_array,
+ float64_array,
BoolNDArrayType,
Int16NDArrayType,
Int32NDArrayType,
@@ -137,156 +137,6 @@ class NoneSerializer(Serializer):
return None
-__skip_class_attr_names__ = ("__module__", "__qualname__", "__dict__",
"__weakref__")
-
-
-class TypeSerializer(Serializer):
- """Serializer for Python type objects (classes), including local
classes."""
-
- def __init__(self, fory, cls):
- super().__init__(fory, cls)
- self.cls = cls
-
- def write(self, buffer, value):
- module_name = value.__module__
- qualname = value.__qualname__
-
- if module_name == "__main__" or "<locals>" in qualname:
- # Local class - serialize full context
- buffer.write_int8(1) # Local class marker
- self._serialize_local_class(buffer, value)
- else:
- buffer.write_int8(0) # Global class marker
- buffer.write_string(module_name)
- buffer.write_string(qualname)
-
- def read(self, buffer):
- class_type = buffer.read_int8()
-
- if class_type == 1:
- # Local class - deserialize from full context
- return self._deserialize_local_class(buffer)
- else:
- # Global class - import by module and name
- module_name = buffer.read_string()
- qualname = buffer.read_string()
- cls = importlib.import_module(module_name)
- for name in qualname.split("."):
- cls = getattr(cls, name)
- result = self.fory.policy.validate_class(cls, is_local=False)
- if result is not None:
- cls = result
- return cls
-
- def _serialize_local_class(self, buffer, cls):
- """Serialize a local class by capturing its creation context."""
- assert self.fory.ref_tracking, "Reference tracking must be enabled for
local classes serialization"
- # Basic class information
- module = cls.__module__
- qualname = cls.__qualname__
- buffer.write_string(module)
- buffer.write_string(qualname)
- fory = self.fory
-
- # Serialize base classes
- # Let Fory's normal serialization handle bases (including other local
classes)
- bases = cls.__bases__
- buffer.write_varuint32(len(bases))
- for base in bases:
- fory.serialize_ref(buffer, base)
-
- # Serialize class dictionary (excluding special attributes)
- # FunctionSerializer will automatically handle methods with closures
- class_dict = {}
- attr_names, class_methods = [], []
- for attr_name, attr_value in cls.__dict__.items():
- # Skip special attributes that are handled by type() constructor
- if attr_name in __skip_class_attr_names__:
- continue
- if isinstance(attr_value, classmethod):
- attr_names.append(attr_name)
- class_methods.append(attr_value)
- else:
- class_dict[attr_name] = attr_value
- # serialize method specially to avoid circular deps in method
deserialization
- buffer.write_varuint32(len(class_methods))
- for i in range(len(class_methods)):
- buffer.write_string(attr_names[i])
- class_method = class_methods[i]
- fory.serialize_ref(buffer, class_method.__func__)
-
- # Let Fory's normal serialization handle the class dict
- # This will use FunctionSerializer for methods, which handles closures
properly
- fory.serialize_ref(buffer, class_dict)
-
- def _deserialize_local_class(self, buffer):
- """Deserialize a local class by recreating it with the captured
context."""
- fory = self.fory
- assert fory.ref_tracking, "Reference tracking must be enabled for
local classes deserialization"
- # Read basic class information
- module = buffer.read_string()
- qualname = buffer.read_string()
- name = qualname.rsplit(".", 1)[-1]
- ref_id = fory.ref_resolver.last_preserved_ref_id()
-
- # Read base classes
- num_bases = buffer.read_varuint32()
- bases = tuple([fory.read_ref(buffer) for _ in range(num_bases)])
- # Create the class using type() constructor
- cls = type(name, bases, {})
- # `class_dict` may reference to `cls`, which is a circular reference
- fory.ref_resolver.set_read_object(ref_id, cls)
-
- # classmethods
- for i in range(buffer.read_varuint32()):
- attr_name = buffer.read_string()
- func = fory.read_ref(buffer)
- method = types.MethodType(func, cls)
- setattr(cls, attr_name, method)
- # Read class dictionary
- # Fory's normal deserialization will handle methods via
FunctionSerializer
- class_dict = fory.read_ref(buffer)
- for k, v in class_dict.items():
- setattr(cls, k, v)
-
- # Set module and qualname
- cls.__module__ = module
- cls.__qualname__ = qualname
- result = fory.policy.validate_class(cls, is_local=True)
- if result is not None:
- cls = result
- return cls
-
-
-class ModuleSerializer(Serializer):
- """Serializer for python module"""
-
- def __init__(self, fory):
- super().__init__(fory, types.ModuleType)
-
- def write(self, buffer, value):
- buffer.write_string(value.__name__)
-
- def read(self, buffer):
- mod = buffer.read_string()
- mod = importlib.import_module(mod)
- result = self.fory.policy.validate_module(mod.__name__)
- if result is not None:
- mod = result
- return mod
-
-
-class MappingProxySerializer(Serializer):
- def __init__(self, fory):
- super().__init__(fory, types.MappingProxyType)
-
- def write(self, buffer, value):
- self.fory.serialize_ref(buffer, dict(value))
-
- def read(self, buffer):
- return types.MappingProxyType(self.fory.read_ref(buffer))
-
-
class PandasRangeIndexSerializer(Serializer):
__slots__ = "_cached"
@@ -308,7 +158,7 @@ class PandasRangeIndexSerializer(Serializer):
buffer.write_int8(NULL_FLAG)
else:
buffer.write_int8(NOT_NULL_VALUE_FLAG)
- fory.serialize_nonref(buffer, start)
+ fory.write_no_ref(buffer, start)
if type(stop) is int:
buffer.write_int16(NOT_NULL_INT64_FLAG)
buffer.write_varint64(stop)
@@ -317,7 +167,7 @@ class PandasRangeIndexSerializer(Serializer):
buffer.write_int8(NULL_FLAG)
else:
buffer.write_int8(NOT_NULL_VALUE_FLAG)
- fory.serialize_nonref(buffer, stop)
+ fory.write_no_ref(buffer, stop)
if type(step) is int:
buffer.write_int16(NOT_NULL_INT64_FLAG)
buffer.write_varint64(step)
@@ -326,9 +176,9 @@ class PandasRangeIndexSerializer(Serializer):
buffer.write_int8(NULL_FLAG)
else:
buffer.write_int8(NOT_NULL_VALUE_FLAG)
- fory.serialize_nonref(buffer, step)
- fory.serialize_ref(buffer, value.dtype)
- fory.serialize_ref(buffer, value.name)
+ fory.write_no_ref(buffer, step)
+ fory.write_ref(buffer, value.dtype)
+ fory.write_ref(buffer, value.name)
def read(self, buffer):
if buffer.read_int8() == NULL_FLAG:
@@ -997,19 +847,19 @@ class DataClassStubSerializer(DataClassSerializer):
typecode_dict = (
{
# use bytes serializer for byte array.
- "h": (2, Int16ArrayType, TypeId.INT16_ARRAY),
- "i": (4, Int32ArrayType, TypeId.INT32_ARRAY),
- "l": (8, Int64ArrayType, TypeId.INT64_ARRAY),
- "f": (4, Float32ArrayType, TypeId.FLOAT32_ARRAY),
- "d": (8, Float64ArrayType, TypeId.FLOAT64_ARRAY),
+ "h": (2, int16_array, TypeId.INT16_ARRAY),
+ "i": (4, int32_array, TypeId.INT32_ARRAY),
+ "l": (8, int64_array, TypeId.INT64_ARRAY),
+ "f": (4, float32_array, TypeId.FLOAT32_ARRAY),
+ "d": (8, float64_array, TypeId.FLOAT64_ARRAY),
}
if not _WINDOWS
else {
- "h": (2, Int16ArrayType, TypeId.INT16_ARRAY),
- "l": (4, Int32ArrayType, TypeId.INT32_ARRAY),
- "q": (8, Int64ArrayType, TypeId.INT64_ARRAY),
- "f": (4, Float32ArrayType, TypeId.FLOAT32_ARRAY),
- "d": (8, Float64ArrayType, TypeId.FLOAT64_ARRAY),
+ "h": (2, int16_array, TypeId.INT16_ARRAY),
+ "l": (4, int32_array, TypeId.INT32_ARRAY),
+ "q": (8, int64_array, TypeId.INT64_ARRAY),
+ "f": (4, float32_array, TypeId.FLOAT32_ARRAY),
+ "d": (8, float64_array, TypeId.FLOAT64_ARRAY),
}
)
@@ -1032,23 +882,23 @@ typeid_code = (
)
-class PyArraySerializer(CrossLanguageCompatibleSerializer):
+class PyArraySerializer(XlangCompatibleSerializer):
typecode_dict = typecode_dict
typecodearray_type = (
{
- "h": Int16ArrayType,
- "i": Int32ArrayType,
- "l": Int64ArrayType,
- "f": Float32ArrayType,
- "d": Float64ArrayType,
+ "h": int16_array,
+ "i": int32_array,
+ "l": int64_array,
+ "f": float32_array,
+ "d": float64_array,
}
if not _WINDOWS
else {
- "h": Int16ArrayType,
- "l": Int32ArrayType,
- "q": Int64ArrayType,
- "f": Float32ArrayType,
- "d": Float64ArrayType,
+ "h": int16_array,
+ "l": int32_array,
+ "q": int64_array,
+ "f": float32_array,
+ "d": float64_array,
}
)
@@ -1198,14 +1048,14 @@ class NDArraySerializer(Serializer):
def write(self, buffer, value):
fory = self.fory
dtype = value.dtype
- fory.serialize_ref(buffer, dtype)
+ fory.write_ref(buffer, dtype)
buffer.write_varuint32(len(value.shape))
for dim in value.shape:
buffer.write_varuint32(dim)
if dtype.kind == "O":
buffer.write_varint32(len(value))
for item in value:
- fory.serialize_ref(buffer, item)
+ fory.write_ref(buffer, item)
else:
fory.write_buffer_object(buffer, NDArrayBufferObject(value))
@@ -1226,7 +1076,7 @@ class NDArraySerializer(Serializer):
return np.frombuffer(fory_buf.to_pybytes(), dtype=dtype).reshape(shape)
-class BytesSerializer(CrossLanguageCompatibleSerializer):
+class BytesSerializer(XlangCompatibleSerializer):
def write(self, buffer, value):
self.fory.write_buffer_object(buffer, BytesBufferObject(value))
@@ -1258,7 +1108,7 @@ class BytesBufferObject(BufferObject):
return memoryview(self.binary)
-class PickleBufferSerializer(CrossLanguageCompatibleSerializer):
+class PickleBufferSerializer(XlangCompatibleSerializer):
def write(self, buffer, value):
self.fory.write_buffer_object(buffer, PickleBufferObject(value))
@@ -1316,7 +1166,7 @@ class NDArrayBufferObject(BufferObject):
return memoryview(self.array.tobytes())
-class StatefulSerializer(CrossLanguageCompatibleSerializer):
+class StatefulSerializer(XlangCompatibleSerializer):
"""
Serializer for objects that support __getstate__ and __setstate__.
Uses Fory's native serialization for better cross-language support.
@@ -1339,11 +1189,11 @@ class
StatefulSerializer(CrossLanguageCompatibleSerializer):
args = self._getnewargs(value)
# Serialize constructor arguments first
- self.fory.serialize_ref(buffer, args)
- self.fory.serialize_ref(buffer, kwargs)
+ self.fory.write_ref(buffer, args)
+ self.fory.write_ref(buffer, kwargs)
# Then serialize the state
- self.fory.serialize_ref(buffer, state)
+ self.fory.write_ref(buffer, state)
def read(self, buffer):
fory = self.fory
@@ -1364,7 +1214,7 @@ class
StatefulSerializer(CrossLanguageCompatibleSerializer):
return obj
-class ReduceSerializer(CrossLanguageCompatibleSerializer):
+class ReduceSerializer(XlangCompatibleSerializer):
"""
Serializer for objects that support __reduce__ or __reduce_ex__.
Uses Fory's native serialization for better cross-language support.
@@ -1429,7 +1279,7 @@ class ReduceSerializer(CrossLanguageCompatibleSerializer):
buffer.write_varuint32(len(reduce_data))
fory = self.fory
for item in reduce_data:
- fory.serialize_ref(buffer, item)
+ fory.write_ref(buffer, item)
def read(self, buffer):
reduce_data_num_items = buffer.read_varuint32()
@@ -1495,7 +1345,157 @@ class
ReduceSerializer(CrossLanguageCompatibleSerializer):
raise ValueError(f"Invalid reduce data format: {reduce_data[0]}")
-class FunctionSerializer(CrossLanguageCompatibleSerializer):
+__skip_class_attr_names__ = ("__module__", "__qualname__", "__dict__",
"__weakref__")
+
+
+class TypeSerializer(Serializer):
+ """Serializer for Python type objects (classes), including local
classes."""
+
+ def __init__(self, fory, cls):
+ super().__init__(fory, cls)
+ self.cls = cls
+
+ def write(self, buffer, value):
+ module_name = value.__module__
+ qualname = value.__qualname__
+
+ if module_name == "__main__" or "<locals>" in qualname:
+ # Local class - serialize full context
+ buffer.write_int8(1) # Local class marker
+ self._serialize_local_class(buffer, value)
+ else:
+ buffer.write_int8(0) # Global class marker
+ buffer.write_string(module_name)
+ buffer.write_string(qualname)
+
+ def read(self, buffer):
+ class_type = buffer.read_int8()
+
+ if class_type == 1:
+ # Local class - deserialize from full context
+ return self._deserialize_local_class(buffer)
+ else:
+ # Global class - import by module and name
+ module_name = buffer.read_string()
+ qualname = buffer.read_string()
+ cls = importlib.import_module(module_name)
+ for name in qualname.split("."):
+ cls = getattr(cls, name)
+ result = self.fory.policy.validate_class(cls, is_local=False)
+ if result is not None:
+ cls = result
+ return cls
+
+ def _serialize_local_class(self, buffer, cls):
+ """Serialize a local class by capturing its creation context."""
+ assert self.fory.ref_tracking, "Reference tracking must be enabled for
local classes serialization"
+ # Basic class information
+ module = cls.__module__
+ qualname = cls.__qualname__
+ buffer.write_string(module)
+ buffer.write_string(qualname)
+ fory = self.fory
+
+ # Serialize base classes
+ # Let Fory's normal serialization handle bases (including other local
classes)
+ bases = cls.__bases__
+ buffer.write_varuint32(len(bases))
+ for base in bases:
+ fory.write_ref(buffer, base)
+
+ # Serialize class dictionary (excluding special attributes)
+ # FunctionSerializer will automatically handle methods with closures
+ class_dict = {}
+ attr_names, class_methods = [], []
+ for attr_name, attr_value in cls.__dict__.items():
+ # Skip special attributes that are handled by type() constructor
+ if attr_name in __skip_class_attr_names__:
+ continue
+ if isinstance(attr_value, classmethod):
+ attr_names.append(attr_name)
+ class_methods.append(attr_value)
+ else:
+ class_dict[attr_name] = attr_value
+ # serialize method specially to avoid circular deps in method
deserialization
+ buffer.write_varuint32(len(class_methods))
+ for i in range(len(class_methods)):
+ buffer.write_string(attr_names[i])
+ class_method = class_methods[i]
+ fory.write_ref(buffer, class_method.__func__)
+
+ # Let Fory's normal serialization handle the class dict
+ # This will use FunctionSerializer for methods, which handles closures
properly
+ fory.write_ref(buffer, class_dict)
+
+ def _deserialize_local_class(self, buffer):
+ """Deserialize a local class by recreating it with the captured
context."""
+ fory = self.fory
+ assert fory.ref_tracking, "Reference tracking must be enabled for
local classes deserialization"
+ # Read basic class information
+ module = buffer.read_string()
+ qualname = buffer.read_string()
+ name = qualname.rsplit(".", 1)[-1]
+ ref_id = fory.ref_resolver.last_preserved_ref_id()
+
+ # Read base classes
+ num_bases = buffer.read_varuint32()
+ bases = tuple([fory.read_ref(buffer) for _ in range(num_bases)])
+ # Create the class using type() constructor
+ cls = type(name, bases, {})
+ # `class_dict` may reference to `cls`, which is a circular reference
+ fory.ref_resolver.set_read_object(ref_id, cls)
+
+ # classmethods
+ for i in range(buffer.read_varuint32()):
+ attr_name = buffer.read_string()
+ func = fory.read_ref(buffer)
+ method = types.MethodType(func, cls)
+ setattr(cls, attr_name, method)
+ # Read class dictionary
+ # Fory's normal deserialization will handle methods via
FunctionSerializer
+ class_dict = fory.read_ref(buffer)
+ for k, v in class_dict.items():
+ setattr(cls, k, v)
+
+ # Set module and qualname
+ cls.__module__ = module
+ cls.__qualname__ = qualname
+ result = fory.policy.validate_class(cls, is_local=True)
+ if result is not None:
+ cls = result
+ return cls
+
+
+class ModuleSerializer(Serializer):
+ """Serializer for python module"""
+
+ def __init__(self, fory):
+ super().__init__(fory, types.ModuleType)
+
+ def write(self, buffer, value):
+ buffer.write_string(value.__name__)
+
+ def read(self, buffer):
+ mod = buffer.read_string()
+ mod = importlib.import_module(mod)
+ result = self.fory.policy.validate_module(mod.__name__)
+ if result is not None:
+ mod = result
+ return mod
+
+
+class MappingProxySerializer(Serializer):
+ def __init__(self, fory):
+ super().__init__(fory, types.MappingProxyType)
+
+ def write(self, buffer, value):
+ self.fory.write_ref(buffer, dict(value))
+
+ def read(self, buffer):
+ return types.MappingProxyType(self.fory.read_ref(buffer))
+
+
+class FunctionSerializer(XlangCompatibleSerializer):
"""Serializer for function objects
This serializer captures all the necessary information to recreate a
function:
@@ -1509,7 +1509,7 @@ class
FunctionSerializer(CrossLanguageCompatibleSerializer):
The code object is serialized with marshal, and all other components
(defaults, globals, closure cells, attrs) go through Fory’s own
- serialize_ref/read_ref pipeline to ensure proper type registration
+ write_ref/read_ref pipeline to ensure proper type registration
and reference tracking.
"""
@@ -1537,7 +1537,7 @@ class
FunctionSerializer(CrossLanguageCompatibleSerializer):
# Serialize as a tuple (is_method, self_obj, method_name)
buffer.write_int8(0) # is a method
# For the 'self' object, we need to use fory's serialization
- self.fory.serialize_ref(buffer, self_obj)
+ self.fory.write_ref(buffer, self_obj)
buffer.write_string(func_name)
return
@@ -1574,7 +1574,7 @@ class
FunctionSerializer(CrossLanguageCompatibleSerializer):
buffer.write_varuint32(len(defaults))
# Serialize each default value individually
for default_value in defaults:
- self.fory.serialize_ref(buffer, default_value)
+ self.fory.write_ref(buffer, default_value)
# Handle closure
# We need to serialize both the closure values and the fact that there
is a closure
@@ -1585,7 +1585,7 @@ class
FunctionSerializer(CrossLanguageCompatibleSerializer):
if closure:
# Extract and serialize each closure cell's contents
for cell in closure:
- self.fory.serialize_ref(buffer, cell.cell_contents)
+ self.fory.write_ref(buffer, cell.cell_contents)
# Serialize free variable names as a list of strings
# Convert tuple to list since tuple might not be registered
@@ -1610,7 +1610,7 @@ class
FunctionSerializer(CrossLanguageCompatibleSerializer):
# Create and serialize a dictionary with only the necessary globals
globals_to_serialize = {name: globals_dict[name] for name in
global_names if name in globals_dict}
- self.fory.serialize_ref(buffer, globals_to_serialize)
+ self.fory.write_ref(buffer, globals_to_serialize)
# Handle additional attributes
attrs = {}
@@ -1624,7 +1624,7 @@ class
FunctionSerializer(CrossLanguageCompatibleSerializer):
except (AttributeError, TypeError):
pass
- self.fory.serialize_ref(buffer, attrs)
+ self.fory.write_ref(buffer, attrs)
def _deserialize_function(self, buffer):
"""Deserialize a function from its components."""
@@ -1755,7 +1755,7 @@ class NativeFuncMethodSerializer(Serializer):
buffer.write_string(module)
else:
buffer.write_bool(False)
- self.fory.serialize_ref(buffer, obj)
+ self.fory.write_ref(buffer, obj)
def read(self, buffer):
name = buffer.read_string()
@@ -1784,7 +1784,7 @@ class MethodSerializer(Serializer):
instance = value.__self__
method_name = value.__func__.__name__
- self.fory.serialize_ref(buffer, instance)
+ self.fory.write_ref(buffer, instance)
buffer.write_string(method_name)
def read(self, buffer):
@@ -1833,7 +1833,7 @@ class ObjectSerializer(Serializer):
for field_name in sorted_field_names:
buffer.write_string(field_name)
field_value = getattr(value, field_name)
- self.fory.serialize_ref(buffer, field_value)
+ self.fory.write_ref(buffer, field_value)
def read(self, buffer):
fory = self.fory
diff --git a/python/pyfory/tests/test_cross_language.py
b/python/pyfory/tests/test_cross_language.py
index 91e64ab2c..66da57cb6 100644
--- a/python/pyfory/tests/test_cross_language.py
+++ b/python/pyfory/tests/test_cross_language.py
@@ -211,7 +211,7 @@ def test_record_batch(data_file_path):
# debug_print(f"batch[0] {batch[0]}")
encoder = pyfory.create_row_encoder(create_foo_schema())
- writer = pyfory.ArrowWriter(create_foo_schema())
+ writer = pyfory.format.ArrowWriter(create_foo_schema())
num_rows = 128
for i in range(num_rows):
foo = create_foo()
@@ -436,7 +436,7 @@ class ComplexObject1:
f8: pyfory.int64 = None
f9: pyfory.float32 = None
f10: pyfory.float64 = None
- f11: pyfory.Int16ArrayType = None
+ f11: pyfory.int16_array = None
f12: List[pyfory.int16] = None
@@ -754,7 +754,7 @@ def test_cross_language_meta_share_complex(data_file_path):
f8: pyfory.int64
f9: pyfory.float32
f10: pyfory.float64
- f11: pyfory.Int16ArrayType
+ f11: pyfory.int16_array
f12: List[pyfory.int16]
fory.register_type(ComplexObject1, namespace="test",
typename="ComplexObject1")
diff --git a/python/pyfory/tests/test_metastring_resolver.py
b/python/pyfory/tests/test_metastring_resolver.py
index a2ecd5af3..21e3d3b23 100644
--- a/python/pyfory/tests/test_metastring_resolver.py
+++ b/python/pyfory/tests/test_metastring_resolver.py
@@ -16,7 +16,7 @@
# under the License.
from pyfory import Buffer
-from pyfory._serialization import MetaStringResolver, MetaStringBytes
+from pyfory.serialization import MetaStringResolver, MetaStringBytes
from pyfory.meta.metastring import MetaStringEncoder
diff --git a/python/pyfory/tests/test_reduce_serializer.py
b/python/pyfory/tests/test_reduce_serializer.py
index c7703c61b..4cdb55552 100644
--- a/python/pyfory/tests/test_reduce_serializer.py
+++ b/python/pyfory/tests/test_reduce_serializer.py
@@ -303,5 +303,5 @@ def test_cross_language_compatibility():
assert deserialized == obj
# The serialized data should use Fory's native format, not pickle
- # This is verified by the fact that we're using serialize_ref/read_ref
+ # This is verified by the fact that we're using write_ref/read_ref
# in the ReduceSerializer implementation
diff --git a/python/pyfory/tests/test_serializer.py
b/python/pyfory/tests/test_serializer.py
index 7b9f6197f..5be23916c 100644
--- a/python/pyfory/tests/test_serializer.py
+++ b/python/pyfory/tests/test_serializer.py
@@ -34,7 +34,7 @@ import pytest
import pyfory
from pyfory.buffer import Buffer
-from pyfory import Fory, Language, _serialization, EnumSerializer
+from pyfory import Fory, Language, serialization, EnumSerializer
from pyfory.serializer import (
TimestampSerializer,
DateSerializer,
@@ -131,11 +131,11 @@ def test_big_chunk_dict(track_ref):
def test_basic_serializer(language):
fory = Fory(language=language, ref=True)
typeinfo = fory.type_resolver.get_typeinfo(datetime.datetime)
- assert isinstance(typeinfo.serializer, (TimestampSerializer,
_serialization.TimestampSerializer))
+ assert isinstance(typeinfo.serializer, (TimestampSerializer,
serialization.TimestampSerializer))
if language == Language.XLANG:
assert typeinfo.type_id == TypeId.TIMESTAMP
typeinfo = fory.type_resolver.get_typeinfo(datetime.date)
- assert isinstance(typeinfo.serializer, (DateSerializer,
_serialization.DateSerializer))
+ assert isinstance(typeinfo.serializer, (DateSerializer,
serialization.DateSerializer))
if language == Language.XLANG:
assert typeinfo.type_id == TypeId.LOCAL_DATE
assert ser_de(fory, True) is True
@@ -550,7 +550,7 @@ def test_duplicate_serialize():
def test_pandas_range_index():
fory = Fory(xlang=False, ref=True, strict=False)
- fory.register_type(pd.RangeIndex,
serializer=pyfory.PandasRangeIndexSerializer(fory))
+ fory.register_type(pd.RangeIndex,
serializer=pyfory.serializer.PandasRangeIndexSerializer(fory))
index = pd.RangeIndex(1, 100, 2, name="a")
new_index = ser_de(fory, index)
pd.testing.assert_index_equal(new_index, new_index)
@@ -760,10 +760,10 @@ def test_module_serialize():
fory = Fory(xlang=False, ref=True, strict=False)
assert fory.loads(fory.dumps(pyfory)) is pyfory
from pyfory import serializer
- from pyfory import _serialization
+ from pyfory import serialization
assert fory.loads(fory.dumps(serializer)) is serializer
- assert fory.loads(fory.dumps(_serialization)) is _serialization
+ assert fory.loads(fory.dumps(serialization)) is serialization
import threading
assert fory.loads(fory.dumps(threading)) is threading
diff --git a/python/pyfory/type.py b/python/pyfory/type.py
index 9f766dfd2..349a25cf2 100644
--- a/python/pyfory/type.py
+++ b/python/pyfory/type.py
@@ -291,11 +291,11 @@ def get_primitive_type_size(type_id) -> int:
# Int8ArrayType = TypeVar("Int8ArrayType", bound=array.ArrayType)
BoolArrayType = TypeVar("BoolArrayType")
-Int16ArrayType = TypeVar("Int16ArrayType", bound=array.ArrayType)
-Int32ArrayType = TypeVar("Int32ArrayType", bound=array.ArrayType)
-Int64ArrayType = TypeVar("Int64ArrayType", bound=array.ArrayType)
-Float32ArrayType = TypeVar("Float32ArrayType", bound=array.ArrayType)
-Float64ArrayType = TypeVar("Float64ArrayType", bound=array.ArrayType)
+int16_array = TypeVar("int16_array", bound=array.ArrayType)
+int32_array = TypeVar("int32_array", bound=array.ArrayType)
+int64_array = TypeVar("int64_array", bound=array.ArrayType)
+float32_array = TypeVar("float32_array", bound=array.ArrayType)
+float64_array = TypeVar("float64_array", bound=array.ArrayType)
BoolNDArrayType = TypeVar("BoolNDArrayType", bound=ndarray)
Int16NDArrayType = TypeVar("Int16NDArrayType", bound=ndarray)
Int32NDArrayType = TypeVar("Int32NDArrayType", bound=ndarray)
@@ -306,11 +306,11 @@ Float64NDArrayType = TypeVar("Float64NDArrayType",
bound=ndarray)
_py_array_types = {
# Int8ArrayType,
- Int16ArrayType,
- Int32ArrayType,
- Int64ArrayType,
- Float32ArrayType,
- Float64ArrayType,
+ int16_array,
+ int32_array,
+ int64_array,
+ float32_array,
+ float64_array,
}
_np_array_types = {
BoolNDArrayType,
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]