(fory) branch main updated: perf(python): add python benchmark suite (#3448)

chaokunyang Tue, 03 Mar 2026 02:34:32 -0800

This is an automated email from the ASF dual-hosted git repository.

chaokunyang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/fory.git



The following commit(s) were added to refs/heads/main by this push:
     new ccecc39be perf(python): add python benchmark suite (#3448)
ccecc39be is described below

commit ccecc39bee412a02aa6e6ad76778347a4832aa81
Author: Shawn Yang <[email protected]>
AuthorDate: Tue Mar 3 18:34:17 2026 +0800

    perf(python): add python benchmark suite (#3448)
    
    ## Why?
    
    
    
    ## What does this PR do?
    
    
    
    ## Related issues
    
    #1017
    #3443
    
    ## Does this PR introduce any user-facing change?
    
    
    
    - [ ] Does this PR introduce any public API change?
    - [ ] Does this PR introduce any binary protocol compatibility change?
    
    ## Benchmark
---
 AGENTS.md                                   |   1 +
 benchmarks/python/README.md                 | 257 ++-------
 benchmarks/python/benchmark.py              | 808 ++++++++++++++++++++++++++++
 benchmarks/python/benchmark_report.py       | 439 +++++++++++++++
 benchmarks/python/run.sh                    | 198 +++++++
 docs/benchmarks/python/README.md            | 127 +++++
 docs/benchmarks/python/mediacontent.png     | Bin 0 -> 49948 bytes
 docs/benchmarks/python/mediacontentlist.png | Bin 0 -> 57263 bytes
 docs/benchmarks/python/sample.png           | Bin 0 -> 53682 bytes
 docs/benchmarks/python/samplelist.png       | Bin 0 -> 60171 bytes
 docs/benchmarks/python/struct.png           | Bin 0 -> 54218 bytes
 docs/benchmarks/python/structlist.png       | Bin 0 -> 52290 bytes
 docs/benchmarks/python/throughput.png       | Bin 0 -> 79016 bytes
 python/pyfory/serialization.pyx             |   2 +-
 python/pyfory/struct.pxi                    |  25 +-
 15 files changed, 1621 insertions(+), 236 deletions(-)

diff --git a/AGENTS.md b/AGENTS.md
index d0d1a01e3..f2d997653 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -25,6 +25,7 @@ While working on Fory, please remember:
 - **Primary references**: `README.md`, `CONTRIBUTING.md`, 
`docs/guide/DEVELOPMENT.md`, and language guides under `docs/guide/`.
 - **Protocol changes**: Read and update the relevant specs in 
`docs/specification/**` and align cross-language tests.
 - **Docs publishing**: Updates under `docs/guide/` and `docs/benchmarks/` are 
synced to https://github.com/apache/fory-site; other website content should be 
changed in that repo.
+- **Benchmark docs refresh is mandatory**: When any benchmark 
logic/script/config or compared serializer set changes, rerun the relevant 
benchmarks and refresh corresponding artifacts under `docs/benchmarks/**` 
(report + plots) before finalizing.
 - **Debugging docs**: C++ debugging guidance lives in `docs/cpp_debug.md`.
 - **Conflicts**: If instructions conflict, follow the most specific module 
docs and call out the conflict in your response.
 
diff --git a/benchmarks/python/README.md b/benchmarks/python/README.md
index 1cc20d5ec..786a56e00 100644
--- a/benchmarks/python/README.md
+++ b/benchmarks/python/README.md
@@ -1,244 +1,63 @@
-# Apache Fory™ CPython Benchmark
+# Apache Fory Python Benchmarks
 
-Microbenchmark comparing Apache Fory™ and Pickle serialization performance in 
CPython.
+This directory contains two benchmark entrypoints:
 
-## Quick Start
+1. `benchmark.py` + `run.sh` (new): C++-parity benchmark matrix covering:
+   - `Struct`, `Sample`, `MediaContent`
+   - `StructList`, `SampleList`, `MediaContentList`
+   - operations: `serialize`, `deserialize`
+   - serializers: `fory`, `pickle`, `protobuf`
+2. `fory_benchmark.py` (legacy): existing CPython microbench script kept 
intact.
 
-### Step 1: Install Apache Fory™ into Python
-
-Follow the installation instructions from the main documentation.
-
-### Step 2: Execute the benchmark script
-
-```bash
-python fory_benchmark.py
-```
-
-This will run all benchmarks with both Fory and Pickle serializers using 
default settings.
-
-## Usage
-
-### Basic Usage
-
-```bash
-# Run all benchmarks with both Fory and Pickle
-python fory_benchmark.py
-
-# Run all benchmarks without reference tracking
-python fory_benchmark.py --no-ref
-
-# Run specific benchmarks
-python fory_benchmark.py --benchmarks dict,large_dict,complex
-
-# Compare only Fory performance
-python fory_benchmark.py --serializers fory
-
-# Compare only Pickle performance
-python fory_benchmark.py --serializers pickle
-
-# Run with more iterations for better accuracy
-python fory_benchmark.py --iterations 50 --repeat 10
-
-# Debug with pure Python mode
-python fory_benchmark.py --disable-cython --benchmarks dict
-```
-
-## Command-Line Options
-
-### Benchmark Selection
-
-#### `--benchmarks BENCHMARK_LIST`
-
-Comma-separated list of benchmarks to run. Default: `all`
-
-Available benchmarks:
-
-- `dict` - Small dictionary serialization (28 fields with mixed types)
-- `large_dict` - Large dictionary (2^10 + 1 entries)
-- `dict_group` - Group of 3 dictionaries
-- `tuple` - Small tuple with nested list
-- `large_tuple` - Large tuple (2^20 + 1 integers)
-- `large_float_tuple` - Large tuple of floats (2^20 + 1 elements)
-- `large_boolean_tuple` - Large tuple of booleans (2^20 + 1 elements)
-- `list` - Nested lists (10x10x10 structure)
-- `large_list` - Large list (2^20 + 1 integers)
-- `complex` - Complex dataclass objects with nested structures
-
-Examples:
-
-```bash
-# Run only dictionary benchmarks
-python fory_benchmark.py --benchmarks dict,large_dict,dict_group
-
-# Run only large data benchmarks
-python fory_benchmark.py --benchmarks large_dict,large_tuple,large_list
-
-# Run only the complex object benchmark
-python fory_benchmark.py --benchmarks complex
-```
-
-#### `--serializers SERIALIZER_LIST`
-
-Comma-separated list of serializers to benchmark. Default: `all`
-
-Available serializers:
-
-- `fory` - Apache Fory™ serialization
-- `pickle` - Python's built-in pickle serialization
-
-Examples:
-
-```bash
-# Compare both serializers (default)
-python fory_benchmark.py --serializers fory,pickle
-
-# Benchmark only Fory
-python fory_benchmark.py --serializers fory
-
-# Benchmark only Pickle
-python fory_benchmark.py --serializers pickle
-```
-
-### Fory Configuration
-
-#### `--no-ref`
-
-Disable reference tracking for Fory. By default, Fory tracks references to 
handle shared and circular references.
-
-```bash
-# Run without reference tracking
-python fory_benchmark.py --no-ref
-```
-
-#### `--disable-cython`
-
-Use pure Python mode instead of Cython serialization for Fory. Useful for 
debugging protocol issues.
-
-```bash
-# Use pure Python serialization
-python fory_benchmark.py --disable-cython
-```
-
-### Benchmark Parameters
-
-These options control the benchmark measurement process:
-
-#### `--warmup N`
-
-Number of warmup iterations before measurement starts. Default: `3`
-
-```bash
-python fory_benchmark.py --warmup 5
-```
-
-#### `--iterations N`
-
-Number of measurement iterations to collect. Default: `20`
-
-```bash
-python fory_benchmark.py --iterations 50
-```
-
-#### `--repeat N`
-
-Number of times to repeat each iteration. Default: `5`
+## Quick Start (Comprehensive Suite)
 
 ```bash
-python fory_benchmark.py --repeat 10
+cd benchmarks/python
+./run.sh
 ```
 
-#### `--number N`
+`run.sh` will:
 
-Number of times to call the serialization function per measurement (inner 
loop). Default: `100`
-
-```bash
-python fory_benchmark.py --number 1000
-```
+1. Generate Python protobuf bindings from `benchmarks/proto/bench.proto`
+2. Run `benchmark.py`
+3. Generate plots + markdown report via `benchmark_report.py`
+4. Copy report/plots to `docs/benchmarks/python`
 
-#### `--help`
-
-Display help message and exit.
+### Common Options
 
 ```bash
-python fory_benchmark.py --help
-```
-
-## Examples
+# Run only Struct benchmarks for Fory serialize
+./run.sh --data struct --serializer fory --operation serialize
 
-### Running Specific Comparisons
-
-```bash
-# Compare Fory and Pickle on dictionary benchmarks
-python fory_benchmark.py --benchmarks dict,large_dict,dict_group
+# Run all data types, deserialize only
+./run.sh --operation deserialize
 
-# Compare performance without reference tracking
-python fory_benchmark.py --no-ref
+# Adjust benchmark loops
+./run.sh --warmup 5 --iterations 30 --repeat 8 --number 1500
 
-# Test only Fory with high precision
-python fory_benchmark.py --serializers fory --iterations 100 --repeat 10
+# Skip docs sync
+./run.sh --no-copy-docs
 ```
 
-### Performance Tuning
+Supported values:
 
-```bash
-# Quick test with fewer iterations
-python fory_benchmark.py --warmup 1 --iterations 5 --repeat 3
-
-# High-precision benchmark
-python fory_benchmark.py --warmup 10 --iterations 100 --repeat 10
+- `--data`: `struct,sample,mediacontent,structlist,samplelist,mediacontentlist`
+- `--serializer`: `fory,pickle,protobuf`
+- `--operation`: `all|serialize|deserialize`
 
-# Benchmark large data structures with more inner loop iterations
-python fory_benchmark.py --benchmarks large_list,large_tuple --number 1000
-```
+## Legacy Script (Unchanged)
 
-### Debugging and Development
+`fory_benchmark.py` remains unchanged and can still be used directly:
 
 ```bash
-# Debug protocol issues with pure Python mode
-python fory_benchmark.py --disable-cython --benchmarks dict
-
-# Test complex objects only
-python fory_benchmark.py --benchmarks complex --iterations 10
-
-# Compare Fory with and without ref tracking
-python fory_benchmark.py --serializers fory --benchmarks dict
-python fory_benchmark.py --serializers fory --benchmarks dict --no-ref
+cd benchmarks/python
+python fory_benchmark.py
 ```
 
-## Output Format
-
-The benchmark script provides three sections of output:
-
-1. **Progress**: Real-time progress as each benchmark runs
-2. **Summary**: Table of all results showing mean time and standard deviation
-3. **Speedup**: Comparison table showing Fory speedup vs Pickle (only when 
both serializers are tested)
+For its original options and behavior, refer to `python fory_benchmark.py 
--help`.
 
-Example output:
+## Notes
 
-```
-Benchmarking 3 benchmark(s) with 2 serializer(s)
-Warmup: 3, Iterations: 20, Repeat: 5, Inner loop: 100
-Fory reference tracking: enabled
-================================================================================
-
-Running fory_dict... 12.34 us ± 0.56 us
-Running pickle_dict... 45.67 us ± 1.23 us
-...
-
-================================================================================
-SUMMARY
-================================================================================
-Serializer      Benchmark                 Mean                 Std Dev
---------------------------------------------------------------------------------
-fory            dict                      12.34 us             0.56 us
-pickle          dict                      45.67 us             1.23 us
-...
-
-================================================================================
-SPEEDUP (Fory vs Pickle)
-================================================================================
-Benchmark                 Fory                 Pickle               Speedup
---------------------------------------------------------------------------------
-dict                      12.34 us             45.67 us             3.70x
-...
-```
+- `pyfory` must be installed in your current Python environment.
+- `protoc` is required by `run.sh` to generate `bench_pb2.py`.
+- `protobuf` benchmarks include dataclass <-> protobuf conversion in the timed 
path.
diff --git a/benchmarks/python/benchmark.py b/benchmarks/python/benchmark.py
new file mode 100755
index 000000000..de85a568f
--- /dev/null
+++ b/benchmarks/python/benchmark.py
@@ -0,0 +1,808 @@
+#!/usr/bin/env python3
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Comprehensive Python benchmark suite for C++ parity benchmark objects.
+
+This script mirrors `benchmarks/cpp/benchmark.cc` coverage and benchmarks:
+- Data types: Struct, Sample, MediaContent and corresponding *List variants.
+- Operations: serialize / deserialize.
+- Serializers: fory / pickle / protobuf.
+
+Results are written as JSON and consumed by `benchmark_report.py`.
+"""
+
+from __future__ import annotations
+
+import argparse
+from dataclasses import dataclass
+import json
+import os
+import pickle
+import platform
+import statistics
+import sys
+import timeit
+from pathlib import Path
+from typing import Any, Callable, Dict, Iterable, List, Tuple
+
+import pyfory
+
+
+LIST_SIZE = 5
+DATA_TYPE_ORDER = [
+    "struct",
+    "sample",
+    "mediacontent",
+    "structlist",
+    "samplelist",
+    "mediacontentlist",
+]
+SERIALIZER_ORDER = ["fory", "pickle", "protobuf"]
+OPERATION_ORDER = ["serialize", "deserialize"]
+
+DATA_LABELS = {
+    "struct": "Struct",
+    "sample": "Sample",
+    "mediacontent": "MediaContent",
+    "structlist": "StructList",
+    "samplelist": "SampleList",
+    "mediacontentlist": "MediaContentList",
+}
+SERIALIZER_LABELS = {
+    "fory": "Fory",
+    "pickle": "Pickle",
+    "protobuf": "Protobuf",
+}
+
+
+@dataclass
+class NumericStruct:
+    f1: int
+    f2: int
+    f3: int
+    f4: int
+    f5: int
+    f6: int
+    f7: int
+    f8: int
+
+
+@dataclass
+class Sample:
+    int_value: int
+    long_value: int
+    float_value: float
+    double_value: float
+    short_value: int
+    char_value: int
+    boolean_value: bool
+    int_value_boxed: int
+    long_value_boxed: int
+    float_value_boxed: float
+    double_value_boxed: float
+    short_value_boxed: int
+    char_value_boxed: int
+    boolean_value_boxed: bool
+    int_array: List[int]
+    long_array: List[int]
+    float_array: List[float]
+    double_array: List[float]
+    short_array: List[int]
+    char_array: List[int]
+    boolean_array: List[bool]
+    string: str
+
+
+@dataclass
+class Media:
+    uri: str
+    title: str
+    width: int
+    height: int
+    format: str
+    duration: int
+    size: int
+    bitrate: int
+    has_bitrate: bool
+    persons: List[str]
+    player: int
+    copyright: str
+
+
+@dataclass
+class Image:
+    uri: str
+    title: str
+    width: int
+    height: int
+    size: int
+
+
+@dataclass
+class MediaContent:
+    media: Media
+    images: List[Image]
+
+
+@dataclass
+class StructList:
+    struct_list: List[NumericStruct]
+
+
+@dataclass
+class SampleList:
+    sample_list: List[Sample]
+
+
+@dataclass
+class MediaContentList:
+    media_content_list: List[MediaContent]
+
+
+def create_numeric_struct() -> NumericStruct:
+    return NumericStruct(
+        f1=-12345,
+        f2=987654321,
+        f3=-31415,
+        f4=27182818,
+        f5=-32000,
+        f6=1000000,
+        f7=-999999999,
+        f8=42,
+    )
+
+
+def create_sample() -> Sample:
+    return Sample(
+        int_value=123,
+        long_value=1230000,
+        float_value=12.345,
+        double_value=1.234567,
+        short_value=12345,
+        char_value=ord("!"),
+        boolean_value=True,
+        int_value_boxed=321,
+        long_value_boxed=3210000,
+        float_value_boxed=54.321,
+        double_value_boxed=7.654321,
+        short_value_boxed=32100,
+        char_value_boxed=ord("$"),
+        boolean_value_boxed=False,
+        int_array=[-1234, -123, -12, -1, 0, 1, 12, 123, 1234],
+        long_array=[-123400, -12300, -1200, -100, 0, 100, 1200, 12300, 123400],
+        float_array=[-12.34, -12.3, -12.0, -1.0, 0.0, 1.0, 12.0, 12.3, 12.34],
+        double_array=[-1.234, -1.23, -12.0, -1.0, 0.0, 1.0, 12.0, 1.23, 1.234],
+        short_array=[-1234, -123, -12, -1, 0, 1, 12, 123, 1234],
+        char_array=[ord(c) for c in "asdfASDF"],
+        boolean_array=[True, False, False, True],
+        string="ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789",
+    )
+
+
+def create_media_content() -> MediaContent:
+    media = Media(
+        uri="http://javaone.com/keynote.ogg";,
+        title="",
+        width=641,
+        height=481,
+        format="video/theora\u1234",
+        duration=18000001,
+        size=58982401,
+        bitrate=0,
+        has_bitrate=False,
+        persons=["Bill Gates, Jr.", "Steven Jobs"],
+        player=1,
+        copyright="Copyright (c) 2009, Scooby Dooby Doo",
+    )
+    images = [
+        Image(
+            uri="http://javaone.com/keynote_huge.jpg";,
+            title="Javaone Keynote\u1234",
+            width=32000,
+            height=24000,
+            size=1,
+        ),
+        Image(
+            uri="http://javaone.com/keynote_large.jpg";,
+            title="",
+            width=1024,
+            height=768,
+            size=1,
+        ),
+        Image(
+            uri="http://javaone.com/keynote_small.jpg";,
+            title="",
+            width=320,
+            height=240,
+            size=0,
+        ),
+    ]
+    return MediaContent(media=media, images=images)
+
+
+def create_struct_list() -> StructList:
+    return StructList(struct_list=[create_numeric_struct() for _ in 
range(LIST_SIZE)])
+
+
+def create_sample_list() -> SampleList:
+    return SampleList(sample_list=[create_sample() for _ in range(LIST_SIZE)])
+
+
+def create_media_content_list() -> MediaContentList:
+    return MediaContentList(
+        media_content_list=[create_media_content() for _ in range(LIST_SIZE)]
+    )
+
+
+def create_benchmark_data() -> Dict[str, Any]:
+    return {
+        "struct": create_numeric_struct(),
+        "sample": create_sample(),
+        "mediacontent": create_media_content(),
+        "structlist": create_struct_list(),
+        "samplelist": create_sample_list(),
+        "mediacontentlist": create_media_content_list(),
+    }
+
+
+def load_bench_pb2(proto_dir: Path):
+    bench_pb2_path = proto_dir / "bench_pb2.py"
+    if not bench_pb2_path.exists():
+        raise FileNotFoundError(
+            f"{bench_pb2_path} does not exist. Run benchmarks/python/run.sh 
first to generate protobuf bindings."
+        )
+    proto_dir_abs = str(proto_dir.resolve())
+    if proto_dir_abs not in sys.path:
+        sys.path.insert(0, proto_dir_abs)
+    import bench_pb2  # type: ignore
+
+    return bench_pb2
+
+
+def to_pb_struct(bench_pb2, obj: NumericStruct):
+    pb = bench_pb2.Struct()
+    pb.f1 = obj.f1
+    pb.f2 = obj.f2
+    pb.f3 = obj.f3
+    pb.f4 = obj.f4
+    pb.f5 = obj.f5
+    pb.f6 = obj.f6
+    pb.f7 = obj.f7
+    pb.f8 = obj.f8
+    return pb
+
+
+def from_pb_struct(pb_obj) -> NumericStruct:
+    return NumericStruct(
+        f1=pb_obj.f1,
+        f2=pb_obj.f2,
+        f3=pb_obj.f3,
+        f4=pb_obj.f4,
+        f5=pb_obj.f5,
+        f6=pb_obj.f6,
+        f7=pb_obj.f7,
+        f8=pb_obj.f8,
+    )
+
+
+def to_pb_sample(bench_pb2, obj: Sample):
+    pb = bench_pb2.Sample()
+    pb.int_value = obj.int_value
+    pb.long_value = obj.long_value
+    pb.float_value = obj.float_value
+    pb.double_value = obj.double_value
+    pb.short_value = obj.short_value
+    pb.char_value = obj.char_value
+    pb.boolean_value = obj.boolean_value
+    pb.int_value_boxed = obj.int_value_boxed
+    pb.long_value_boxed = obj.long_value_boxed
+    pb.float_value_boxed = obj.float_value_boxed
+    pb.double_value_boxed = obj.double_value_boxed
+    pb.short_value_boxed = obj.short_value_boxed
+    pb.char_value_boxed = obj.char_value_boxed
+    pb.boolean_value_boxed = obj.boolean_value_boxed
+    pb.int_array.extend(obj.int_array)
+    pb.long_array.extend(obj.long_array)
+    pb.float_array.extend(obj.float_array)
+    pb.double_array.extend(obj.double_array)
+    pb.short_array.extend(obj.short_array)
+    pb.char_array.extend(obj.char_array)
+    pb.boolean_array.extend(obj.boolean_array)
+    pb.string = obj.string
+    return pb
+
+
+def from_pb_sample(pb_obj) -> Sample:
+    return Sample(
+        int_value=pb_obj.int_value,
+        long_value=pb_obj.long_value,
+        float_value=pb_obj.float_value,
+        double_value=pb_obj.double_value,
+        short_value=pb_obj.short_value,
+        char_value=pb_obj.char_value,
+        boolean_value=pb_obj.boolean_value,
+        int_value_boxed=pb_obj.int_value_boxed,
+        long_value_boxed=pb_obj.long_value_boxed,
+        float_value_boxed=pb_obj.float_value_boxed,
+        double_value_boxed=pb_obj.double_value_boxed,
+        short_value_boxed=pb_obj.short_value_boxed,
+        char_value_boxed=pb_obj.char_value_boxed,
+        boolean_value_boxed=pb_obj.boolean_value_boxed,
+        int_array=list(pb_obj.int_array),
+        long_array=list(pb_obj.long_array),
+        float_array=list(pb_obj.float_array),
+        double_array=list(pb_obj.double_array),
+        short_array=list(pb_obj.short_array),
+        char_array=list(pb_obj.char_array),
+        boolean_array=list(pb_obj.boolean_array),
+        string=pb_obj.string,
+    )
+
+
+def to_pb_image(bench_pb2, obj: Image):
+    pb = bench_pb2.Image()
+    pb.uri = obj.uri
+    if obj.title:
+        pb.title = obj.title
+    pb.width = obj.width
+    pb.height = obj.height
+    pb.size = obj.size
+    return pb
+
+
+def from_pb_image(pb_obj) -> Image:
+    title = pb_obj.title if pb_obj.HasField("title") else ""
+    return Image(
+        uri=pb_obj.uri,
+        title=title,
+        width=pb_obj.width,
+        height=pb_obj.height,
+        size=pb_obj.size,
+    )
+
+
+def to_pb_media(bench_pb2, obj: Media):
+    pb = bench_pb2.Media()
+    pb.uri = obj.uri
+    if obj.title:
+        pb.title = obj.title
+    pb.width = obj.width
+    pb.height = obj.height
+    pb.format = obj.format
+    pb.duration = obj.duration
+    pb.size = obj.size
+    pb.bitrate = obj.bitrate
+    pb.has_bitrate = obj.has_bitrate
+    pb.persons.extend(obj.persons)
+    pb.player = obj.player
+    pb.copyright = obj.copyright
+    return pb
+
+
+def from_pb_media(pb_obj) -> Media:
+    title = pb_obj.title if pb_obj.HasField("title") else ""
+    return Media(
+        uri=pb_obj.uri,
+        title=title,
+        width=pb_obj.width,
+        height=pb_obj.height,
+        format=pb_obj.format,
+        duration=pb_obj.duration,
+        size=pb_obj.size,
+        bitrate=pb_obj.bitrate,
+        has_bitrate=pb_obj.has_bitrate,
+        persons=list(pb_obj.persons),
+        player=pb_obj.player,
+        copyright=pb_obj.copyright,
+    )
+
+
+def to_pb_mediacontent(bench_pb2, obj: MediaContent):
+    pb = bench_pb2.MediaContent()
+    pb.media.CopyFrom(to_pb_media(bench_pb2, obj.media))
+    for image in obj.images:
+        pb.images.add().CopyFrom(to_pb_image(bench_pb2, image))
+    return pb
+
+
+def from_pb_mediacontent(pb_obj) -> MediaContent:
+    return MediaContent(
+        media=from_pb_media(pb_obj.media),
+        images=[from_pb_image(img) for img in pb_obj.images],
+    )
+
+
+def to_pb_structlist(bench_pb2, obj: StructList):
+    pb = bench_pb2.StructList()
+    for item in obj.struct_list:
+        pb.struct_list.add().CopyFrom(to_pb_struct(bench_pb2, item))
+    return pb
+
+
+def from_pb_structlist(pb_obj) -> StructList:
+    return StructList(struct_list=[from_pb_struct(item) for item in 
pb_obj.struct_list])
+
+
+def to_pb_samplelist(bench_pb2, obj: SampleList):
+    pb = bench_pb2.SampleList()
+    for item in obj.sample_list:
+        pb.sample_list.add().CopyFrom(to_pb_sample(bench_pb2, item))
+    return pb
+
+
+def from_pb_samplelist(pb_obj) -> SampleList:
+    return SampleList(sample_list=[from_pb_sample(item) for item in 
pb_obj.sample_list])
+
+
+def to_pb_mediacontentlist(bench_pb2, obj: MediaContentList):
+    pb = bench_pb2.MediaContentList()
+    for item in obj.media_content_list:
+        pb.media_content_list.add().CopyFrom(to_pb_mediacontent(bench_pb2, 
item))
+    return pb
+
+
+def from_pb_mediacontentlist(pb_obj) -> MediaContentList:
+    return MediaContentList(
+        media_content_list=[
+            from_pb_mediacontent(item) for item in pb_obj.media_content_list
+        ]
+    )
+
+
+PROTO_CONVERTERS = {
+    "struct": (to_pb_struct, from_pb_struct, "Struct"),
+    "sample": (to_pb_sample, from_pb_sample, "Sample"),
+    "mediacontent": (to_pb_mediacontent, from_pb_mediacontent, "MediaContent"),
+    "structlist": (to_pb_structlist, from_pb_structlist, "StructList"),
+    "samplelist": (to_pb_samplelist, from_pb_samplelist, "SampleList"),
+    "mediacontentlist": (
+        to_pb_mediacontentlist,
+        from_pb_mediacontentlist,
+        "MediaContentList",
+    ),
+}
+
+
+def build_fory() -> pyfory.Fory:
+    fory = pyfory.Fory(xlang=True, compatible=True, ref=False)
+    fory.register_type(NumericStruct, type_id=1)
+    fory.register_type(Sample, type_id=2)
+    fory.register_type(Media, type_id=3)
+    fory.register_type(Image, type_id=4)
+    fory.register_type(MediaContent, type_id=5)
+    fory.register_type(StructList, type_id=6)
+    fory.register_type(SampleList, type_id=7)
+    fory.register_type(MediaContentList, type_id=8)
+    return fory
+
+
+def run_benchmark(
+    func: Callable[..., Any],
+    args: Tuple[Any, ...],
+    *,
+    warmup: int,
+    iterations: int,
+    repeat: int,
+    number: int,
+) -> Tuple[float, float]:
+    for _ in range(warmup):
+        for _ in range(number):
+            func(*args)
+
+    samples: List[float] = []
+    for _ in range(iterations):
+        timer = timeit.Timer(lambda: func(*args))
+        loop_times = timer.repeat(repeat=repeat, number=number)
+        samples.extend([time_total / number for time_total in loop_times])
+
+    mean = statistics.mean(samples)
+    stdev = statistics.stdev(samples) if len(samples) > 1 else 0.0
+    return mean, stdev
+
+
+def format_time(seconds: float) -> str:
+    if seconds < 1e-6:
+        return f"{seconds * 1e9:.2f} ns"
+    if seconds < 1e-3:
+        return f"{seconds * 1e6:.2f} us"
+    if seconds < 1:
+        return f"{seconds * 1e3:.2f} ms"
+    return f"{seconds:.2f} s"
+
+
+def fory_serialize(fory: pyfory.Fory, obj: Any) -> None:
+    fory.serialize(obj)
+
+
+def fory_deserialize(fory: pyfory.Fory, binary: bytes) -> None:
+    fory.deserialize(binary)
+
+
+def pickle_serialize(obj: Any) -> None:
+    pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL)
+
+
+def pickle_deserialize(binary: bytes) -> None:
+    pickle.loads(binary)
+
+
+def protobuf_serialize(bench_pb2, datatype: str, obj: Any) -> None:
+    to_pb, _, _ = PROTO_CONVERTERS[datatype]
+    pb_obj = to_pb(bench_pb2, obj)
+    pb_obj.SerializeToString()
+
+
+def protobuf_deserialize(bench_pb2, datatype: str, binary: bytes) -> None:
+    _, from_pb, pb_type_name = PROTO_CONVERTERS[datatype]
+    pb_cls = getattr(bench_pb2, pb_type_name)
+    pb_obj = pb_cls()
+    pb_obj.ParseFromString(binary)
+    from_pb(pb_obj)
+
+
+def benchmark_name(serializer: str, datatype: str, operation: str) -> str:
+    return 
f"BM_{SERIALIZER_LABELS[serializer]}_{DATA_LABELS[datatype]}_{operation.capitalize()}"
+
+
+def build_case(
+    serializer: str,
+    operation: str,
+    datatype: str,
+    obj: Any,
+    *,
+    fory: pyfory.Fory,
+    bench_pb2,
+) -> Tuple[Callable[..., Any], Tuple[Any, ...]]:
+    if serializer == "fory":
+        if operation == "serialize":
+            return fory_serialize, (fory, obj)
+        return fory_deserialize, (fory, fory.serialize(obj))
+
+    if serializer == "pickle":
+        if operation == "serialize":
+            return pickle_serialize, (obj,)
+        return pickle_deserialize, (
+            pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL),
+        )
+
+    if serializer == "protobuf":
+        if operation == "serialize":
+            return protobuf_serialize, (bench_pb2, datatype, obj)
+        to_pb, _, _ = PROTO_CONVERTERS[datatype]
+        pb_binary = to_pb(bench_pb2, obj).SerializeToString()
+        return protobuf_deserialize, (bench_pb2, datatype, pb_binary)
+
+    raise ValueError(f"Unsupported serializer: {serializer}")
+
+
+def calculate_serialized_sizes(
+    benchmark_data: Dict[str, Any],
+    selected_datatypes: Iterable[str],
+    *,
+    fory: pyfory.Fory,
+    bench_pb2,
+) -> Dict[str, Dict[str, int]]:
+    sizes: Dict[str, Dict[str, int]] = {}
+    for datatype in selected_datatypes:
+        obj = benchmark_data[datatype]
+        datatype_sizes: Dict[str, int] = {}
+
+        datatype_sizes["fory"] = len(fory.serialize(obj))
+        datatype_sizes["pickle"] = len(
+            pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL)
+        )
+
+        to_pb, _, _ = PROTO_CONVERTERS[datatype]
+        datatype_sizes["protobuf"] = len(to_pb(bench_pb2, 
obj).SerializeToString())
+
+        sizes[datatype] = datatype_sizes
+    return sizes
+
+
+def parse_csv_list(value: str, allowed: Iterable[str], default: List[str]) -> 
List[str]:
+    if value == "all":
+        return list(default)
+    selected = [item.strip().lower() for item in value.split(",") if 
item.strip()]
+    invalid = [item for item in selected if item not in allowed]
+    if invalid:
+        raise ValueError(
+            f"Invalid values: {', '.join(invalid)}. Allowed: {', 
'.join(sorted(allowed))}"
+        )
+    ordered = [item for item in default if item in selected]
+    return ordered
+
+
+def benchmark_number(base_number: int, datatype: str) -> int:
+    scale = {
+        "struct": 1.0,
+        "sample": 0.5,
+        "mediacontent": 0.4,
+        "structlist": 0.25,
+        "samplelist": 0.2,
+        "mediacontentlist": 0.15,
+    }
+    return max(1, int(base_number * scale.get(datatype, 1.0)))
+
+
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description="Comprehensive Fory/Pickle/Protobuf benchmark for Python"
+    )
+    parser.add_argument(
+        "--operation",
+        default="all",
+        choices=["all", "serialize", "deserialize"],
+        help="Benchmark operation: all, serialize, deserialize",
+    )
+    parser.add_argument(
+        "--data",
+        default="all",
+        help="Comma-separated data types: 
struct,sample,mediacontent,structlist,samplelist,mediacontentlist or all",
+    )
+    parser.add_argument(
+        "--serializer",
+        default="all",
+        help="Comma-separated serializers: fory,pickle,protobuf or all",
+    )
+    parser.add_argument(
+        "--warmup",
+        type=int,
+        default=3,
+        help="Warmup iterations (default: 3)",
+    )
+    parser.add_argument(
+        "--iterations",
+        type=int,
+        default=15,
+        help="Measurement iterations (default: 15)",
+    )
+    parser.add_argument(
+        "--repeat",
+        type=int,
+        default=5,
+        help="Timer repeat count per iteration (default: 5)",
+    )
+    parser.add_argument(
+        "--number",
+        type=int,
+        default=1000,
+        help="Function calls per timer measurement (default: 1000)",
+    )
+    parser.add_argument(
+        "--proto-dir",
+        default=str(Path(__file__).with_name("proto")),
+        help="Directory containing generated bench_pb2.py",
+    )
+    parser.add_argument(
+        "--output-json",
+        default=str(Path(__file__).with_name("results") / 
"benchmark_results.json"),
+        help="Output JSON file path",
+    )
+    return parser.parse_args()
+
+
+def main() -> int:
+    args = parse_args()
+
+    proto_dir = Path(args.proto_dir)
+    bench_pb2 = load_bench_pb2(proto_dir)
+
+    selected_datatypes = parse_csv_list(args.data, DATA_TYPE_ORDER, 
DATA_TYPE_ORDER)
+    selected_serializers = parse_csv_list(
+        args.serializer, SERIALIZER_ORDER, SERIALIZER_ORDER
+    )
+    selected_operations = (
+        OPERATION_ORDER if args.operation == "all" else [args.operation]
+    )
+
+    benchmark_data = create_benchmark_data()
+    fory = build_fory()
+
+    print(
+        f"Benchmarking {len(selected_datatypes)} data type(s), "
+        f"{len(selected_serializers)} serializer(s), "
+        f"{len(selected_operations)} operation(s)"
+    )
+    print(
+        f"Warmup={args.warmup}, Iterations={args.iterations}, 
Repeat={args.repeat}, Number={args.number}"
+    )
+    print("=" * 96)
+
+    results = []
+
+    for datatype in selected_datatypes:
+        obj = benchmark_data[datatype]
+        call_number = benchmark_number(args.number, datatype)
+        for operation in selected_operations:
+            for serializer in selected_serializers:
+                case_name = benchmark_name(serializer, datatype, operation)
+                print(f"Running {case_name} ...", end=" ", flush=True)
+
+                func, func_args = build_case(
+                    serializer,
+                    operation,
+                    datatype,
+                    obj,
+                    fory=fory,
+                    bench_pb2=bench_pb2,
+                )
+                mean, stdev = run_benchmark(
+                    func,
+                    func_args,
+                    warmup=args.warmup,
+                    iterations=args.iterations,
+                    repeat=args.repeat,
+                    number=call_number,
+                )
+
+                results.append(
+                    {
+                        "name": case_name,
+                        "serializer": serializer,
+                        "datatype": datatype,
+                        "operation": operation,
+                        "mean_seconds": mean,
+                        "stdev_seconds": stdev,
+                        "mean_ns": mean * 1e9,
+                        "stdev_ns": stdev * 1e9,
+                        "number": call_number,
+                    }
+                )
+                print(f"{format_time(mean)} ± {format_time(stdev)}")
+
+    sizes = calculate_serialized_sizes(
+        benchmark_data,
+        selected_datatypes,
+        fory=fory,
+        bench_pb2=bench_pb2,
+    )
+
+    output_path = Path(args.output_json)
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+
+    payload = {
+        "context": {
+            "python_version": platform.python_version(),
+            "python_implementation": platform.python_implementation(),
+            "platform": platform.platform(),
+            "machine": platform.machine(),
+            "processor": platform.processor() or "Unknown",
+            "enable_fory_debug_output": os.getenv("ENABLE_FORY_DEBUG_OUTPUT", 
"0"),
+            "warmup": args.warmup,
+            "iterations": args.iterations,
+            "repeat": args.repeat,
+            "number": args.number,
+            "operations": selected_operations,
+            "datatypes": selected_datatypes,
+            "serializers": selected_serializers,
+            "list_size": LIST_SIZE,
+        },
+        "benchmarks": results,
+        "sizes": sizes,
+    }
+
+    with output_path.open("w", encoding="utf-8") as f:
+        json.dump(payload, f, indent=2)
+
+    print("=" * 96)
+    print(f"Benchmark JSON written to: {output_path}")
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/benchmarks/python/benchmark_report.py 
b/benchmarks/python/benchmark_report.py
new file mode 100755
index 000000000..8df4298e0
--- /dev/null
+++ b/benchmarks/python/benchmark_report.py
@@ -0,0 +1,439 @@
+#!/usr/bin/env python3
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Generate plots and Markdown report from Python benchmark JSON results."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import platform
+from collections import defaultdict
+from datetime import datetime
+from pathlib import Path
+from typing import Dict
+
+import matplotlib.pyplot as plt
+import numpy as np
+
+try:
+    import psutil
+
+    HAS_PSUTIL = True
+except ImportError:
+    HAS_PSUTIL = False
+
+
+COLORS = {
+    "fory": "#FF6F01",
+    "pickle": "#4C78A8",
+    "protobuf": "#55BCC2",
+}
+SERIALIZER_ORDER = ["fory", "pickle", "protobuf"]
+SERIALIZER_LABELS = {
+    "fory": "fory",
+    "pickle": "pickle",
+    "protobuf": "protobuf",
+}
+DATATYPE_ORDER = [
+    "struct",
+    "sample",
+    "mediacontent",
+    "structlist",
+    "samplelist",
+    "mediacontentlist",
+]
+
+
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description="Generate markdown report and plots for Python benchmark 
suite"
+    )
+    parser.add_argument(
+        "--json-file",
+        default="results/benchmark_results.json",
+        help="Benchmark JSON file produced by benchmark.py",
+    )
+    parser.add_argument(
+        "--output-dir",
+        default="results/report",
+        help="Output directory for report and plots",
+    )
+    parser.add_argument(
+        "--plot-prefix",
+        default="",
+        help="Optional image path prefix used in markdown",
+    )
+    return parser.parse_args()
+
+
+def load_json(path: Path) -> Dict:
+    with path.open("r", encoding="utf-8") as f:
+        return json.load(f)
+
+
+def get_system_info() -> Dict[str, str]:
+    info = {
+        "OS": f"{platform.system()} {platform.release()}",
+        "Machine": platform.machine(),
+        "Processor": platform.processor() or "Unknown",
+        "Python": platform.python_version(),
+    }
+    if HAS_PSUTIL:
+        info["CPU Cores (Physical)"] = str(psutil.cpu_count(logical=False))
+        info["CPU Cores (Logical)"] = str(psutil.cpu_count(logical=True))
+        info["Total RAM (GB)"] = str(
+            round(psutil.virtual_memory().total / (1024**3), 2)
+        )
+    return info
+
+
+def format_datatype_label(datatype: str) -> str:
+    mapping = {
+        "struct": "Struct",
+        "sample": "Sample",
+        "mediacontent": "MediaContent",
+        "structlist": "Struct\nList",
+        "samplelist": "Sample\nList",
+        "mediacontentlist": "MediaContent\nList",
+    }
+    return mapping.get(datatype, datatype)
+
+
+def format_datatype_table_label(datatype: str) -> str:
+    mapping = {
+        "struct": "Struct",
+        "sample": "Sample",
+        "mediacontent": "MediaContent",
+        "structlist": "StructList",
+        "samplelist": "SampleList",
+        "mediacontentlist": "MediaContentList",
+    }
+    return mapping.get(datatype, datatype)
+
+
+def format_tps_label(tps: float) -> str:
+    if tps >= 1e9:
+        return f"{tps / 1e9:.2f}G"
+    if tps >= 1e6:
+        return f"{tps / 1e6:.2f}M"
+    if tps >= 1e3:
+        return f"{tps / 1e3:.2f}K"
+    return f"{tps:.0f}"
+
+
+def build_benchmark_matrix(benchmarks):
+    data = defaultdict(lambda: defaultdict(dict))
+    for bench in benchmarks:
+        datatype = bench["datatype"]
+        operation = bench["operation"]
+        serializer = bench["serializer"]
+        data[datatype][operation][serializer] = bench["mean_ns"]
+    return data
+
+
+def plot_datatype(ax, data, datatype: str, operation: str):
+    if datatype not in data or operation not in data[datatype]:
+        ax.set_title(f"{format_datatype_table_label(datatype)} {operation}: no 
data")
+        ax.axis("off")
+        return
+
+    libs = [
+        lib
+        for lib in SERIALIZER_ORDER
+        if data[datatype][operation].get(lib, 0) and 
data[datatype][operation][lib] > 0
+    ]
+    if not libs:
+        ax.set_title(f"{format_datatype_table_label(datatype)} {operation}: no 
data")
+        ax.axis("off")
+        return
+
+    times = [data[datatype][operation][lib] for lib in libs]
+    throughput = [1e9 / t if t > 0 else 0 for t in times]
+
+    x = np.arange(len(libs))
+    bars = ax.bar(
+        x,
+        throughput,
+        color=[COLORS.get(lib, "#999999") for lib in libs],
+        width=0.6,
+    )
+
+    ax.set_xticks(x)
+    ax.set_xticklabels([SERIALIZER_LABELS.get(lib, lib) for lib in libs])
+    ax.set_ylabel("Throughput (ops/sec)")
+    ax.set_title(f"{operation.capitalize()} Throughput (higher is better)")
+    ax.grid(True, axis="y", linestyle="--", alpha=0.45)
+    ax.ticklabel_format(style="scientific", axis="y", scilimits=(0, 0))
+
+    for bar, val in zip(bars, throughput):
+        ax.annotate(
+            format_tps_label(val),
+            xy=(bar.get_x() + bar.get_width() / 2, bar.get_height()),
+            xytext=(0, 3),
+            textcoords="offset points",
+            ha="center",
+            va="bottom",
+            fontsize=9,
+        )
+
+
+def plot_combined_subplot(ax, data, datatypes, operation: str, title: str):
+    available_dts = [dt for dt in datatypes if operation in data.get(dt, {})]
+    if not available_dts:
+        ax.set_title(f"{title}\nNo Data")
+        ax.axis("off")
+        return
+
+    x = np.arange(len(available_dts))
+    available_libs = [
+        lib
+        for lib in SERIALIZER_ORDER
+        if any(
+            data.get(dt, {}).get(operation, {}).get(lib, 0) > 0 for dt in 
available_dts
+        )
+    ]
+    if not available_libs:
+        ax.set_title(f"{title}\nNo Data")
+        ax.axis("off")
+        return
+
+    width = 0.8 / len(available_libs)
+    for idx, lib in enumerate(available_libs):
+        times = [
+            data.get(dt, {}).get(operation, {}).get(lib, 0) for dt in 
available_dts
+        ]
+        tps = [1e9 / val if val > 0 else 0 for val in times]
+        offset = (idx - (len(available_libs) - 1) / 2) * width
+        ax.bar(
+            x + offset,
+            tps,
+            width,
+            label=SERIALIZER_LABELS.get(lib, lib),
+            color=COLORS.get(lib, "#999999"),
+        )
+
+    ax.set_title(title)
+    ax.set_xticks(x)
+    ax.set_xticklabels([format_datatype_label(dt) for dt in available_dts])
+    ax.grid(True, axis="y", linestyle="--", alpha=0.45)
+    ax.ticklabel_format(style="scientific", axis="y", scilimits=(0, 0))
+    ax.legend()
+
+
+def generate_plots(data, output_dir: Path):
+    plot_images = []
+    operations = ["serialize", "deserialize"]
+
+    datatypes = [dt for dt in DATATYPE_ORDER if dt in data]
+    for datatype in datatypes:
+        fig, axes = plt.subplots(1, 2, figsize=(12, 5))
+        for idx, operation in enumerate(operations):
+            plot_datatype(axes[idx], data, datatype, operation)
+        fig.suptitle(f"{format_datatype_table_label(datatype)} Throughput", 
fontsize=14)
+        fig.tight_layout(rect=[0, 0, 1, 0.95])
+
+        path = output_dir / f"{datatype}.png"
+        plt.savefig(path, dpi=150)
+        plt.close()
+        plot_images.append((datatype, path))
+
+    non_list_datatypes = [dt for dt in datatypes if not dt.endswith("list")]
+    list_datatypes = [dt for dt in datatypes if dt.endswith("list")]
+
+    fig, axes = plt.subplots(1, 4, figsize=(28, 6))
+    fig.supylabel("Throughput (ops/sec)")
+
+    plot_combined_subplot(
+        axes[0], data, non_list_datatypes, "serialize", "Serialize Throughput"
+    )
+    plot_combined_subplot(
+        axes[1], data, non_list_datatypes, "deserialize", "Deserialize 
Throughput"
+    )
+    plot_combined_subplot(
+        axes[2], data, list_datatypes, "serialize", "Serialize Throughput 
(*List)"
+    )
+    plot_combined_subplot(
+        axes[3], data, list_datatypes, "deserialize", "Deserialize Throughput 
(*List)"
+    )
+
+    fig.tight_layout()
+    throughput_path = output_dir / "throughput.png"
+    plt.savefig(throughput_path, dpi=150)
+    plt.close()
+    plot_images.append(("throughput", throughput_path))
+
+    return plot_images
+
+
+def generate_markdown_report(
+    raw, data, sizes, plot_images, output_dir: Path, plot_prefix: str
+):
+    context = raw.get("context", {})
+    system_info = get_system_info()
+
+    if context.get("python_implementation"):
+        system_info["Python Implementation"] = context["python_implementation"]
+    if context.get("platform"):
+        system_info["Benchmark Platform"] = context["platform"]
+
+    datatypes = [dt for dt in DATATYPE_ORDER if dt in data]
+    operations = ["serialize", "deserialize"]
+
+    md = [
+        "# Python Benchmark Performance Report\n\n",
+        f"_Generated on {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}_\n\n",
+        "## How to Generate This Report\n\n",
+        "```bash\n",
+        "cd benchmarks/python\n",
+        "./run.sh\n",
+        "```\n\n",
+        "## Hardware & OS Info\n\n",
+        "| Key | Value |\n",
+        "|-----|-------|\n",
+    ]
+
+    for key, value in system_info.items():
+        md.append(f"| {key} | {value} |\n")
+
+    md.append("\n## Benchmark Configuration\n\n")
+    md.append("| Key | Value |\n")
+    md.append("|-----|-------|\n")
+    for key in ["warmup", "iterations", "repeat", "number", "list_size"]:
+        if key in context:
+            md.append(f"| {key} | {context[key]} |\n")
+
+    md.append("\n## Benchmark Plots\n")
+    md.append("\nAll plots show throughput (ops/sec); higher is better.\n")
+
+    plot_images_sorted = sorted(
+        plot_images, key=lambda item: (0 if item[0] == "throughput" else 1, 
item[0])
+    )
+    for datatype, image_path in plot_images_sorted:
+        image_name = os.path.basename(image_path)
+        image_ref = f"{plot_prefix}{image_name}"
+        md.append(f"\n### {datatype.replace('_', ' ').title()}\n\n")
+        md.append(f'<p align="center">\n<img src="{image_ref}" width="90%" 
/>\n</p>\n')
+
+    md.append("\n## Benchmark Results\n\n")
+    md.append("### Timing Results (nanoseconds)\n\n")
+    md.append(
+        "| Datatype | Operation | fory (ns) | pickle (ns) | protobuf (ns) | 
Fastest |\n"
+    )
+    md.append(
+        
"|----------|-----------|-----------|-------------|---------------|---------|\n"
+    )
+
+    for datatype in datatypes:
+        for operation in operations:
+            times = {
+                lib: data.get(datatype, {}).get(operation, {}).get(lib, 0)
+                for lib in SERIALIZER_ORDER
+            }
+            valid = {lib: val for lib, val in times.items() if val > 0}
+            fastest = min(valid, key=valid.get) if valid else "N/A"
+            md.append(
+                "| "
+                + f"{format_datatype_table_label(datatype)} | 
{operation.capitalize()} | "
+                + " | ".join(
+                    f"{times[lib]:.1f}" if times[lib] > 0 else "N/A"
+                    for lib in SERIALIZER_ORDER
+                )
+                + f" | {SERIALIZER_LABELS.get(fastest, fastest)} |\n"
+            )
+
+    md.append("\n### Throughput Results (ops/sec)\n\n")
+    md.append(
+        "| Datatype | Operation | fory TPS | pickle TPS | protobuf TPS | 
Fastest |\n"
+    )
+    md.append(
+        
"|----------|-----------|----------|------------|--------------|---------|\n"
+    )
+
+    for datatype in datatypes:
+        for operation in operations:
+            times = {
+                lib: data.get(datatype, {}).get(operation, {}).get(lib, 0)
+                for lib in SERIALIZER_ORDER
+            }
+            tps = {lib: (1e9 / val if val > 0 else 0) for lib, val in 
times.items()}
+            valid_tps = {lib: val for lib, val in tps.items() if val > 0}
+            fastest = max(valid_tps, key=valid_tps.get) if valid_tps else "N/A"
+            md.append(
+                "| "
+                + f"{format_datatype_table_label(datatype)} | 
{operation.capitalize()} | "
+                + " | ".join(
+                    f"{tps[lib]:,.0f}" if tps[lib] > 0 else "N/A"
+                    for lib in SERIALIZER_ORDER
+                )
+                + f" | {SERIALIZER_LABELS.get(fastest, fastest)} |\n"
+            )
+
+    if sizes:
+        md.append("\n### Serialized Data Sizes (bytes)\n\n")
+        md.append("| Datatype | fory | pickle | protobuf |\n")
+        md.append("|----------|------|--------|----------|\n")
+
+        for datatype in datatypes:
+            datatype_sizes = sizes.get(datatype, {})
+            row = []
+            for lib in SERIALIZER_ORDER:
+                value = datatype_sizes.get(lib, -1)
+                row.append(str(value) if value is not None and value >= 0 else 
"N/A")
+            md.append(
+                f"| {format_datatype_table_label(datatype)} | "
+                + " | ".join(row)
+                + " |\n"
+            )
+
+    report_path = output_dir / "README.md"
+    report_path.write_text("".join(md), encoding="utf-8")
+    return report_path
+
+
+def main() -> int:
+    args = parse_args()
+
+    json_file = Path(args.json_file)
+    output_dir = Path(args.output_dir)
+    output_dir.mkdir(parents=True, exist_ok=True)
+
+    raw = load_json(json_file)
+    benchmarks = raw.get("benchmarks", [])
+    sizes = raw.get("sizes", {})
+
+    data = build_benchmark_matrix(benchmarks)
+    plot_images = generate_plots(data, output_dir)
+
+    report_path = generate_markdown_report(
+        raw,
+        data,
+        sizes,
+        plot_images,
+        output_dir,
+        args.plot_prefix,
+    )
+
+    print(f"Plots saved in: {output_dir}")
+    print(f"Markdown report generated at: {report_path}")
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/benchmarks/python/run.sh b/benchmarks/python/run.sh
new file mode 100755
index 000000000..ff28f81ce
--- /dev/null
+++ b/benchmarks/python/run.sh
@@ -0,0 +1,198 @@
+#!/bin/bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+set -euo pipefail
+export ENABLE_FORY_DEBUG_OUTPUT=0
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+cd "$SCRIPT_DIR"
+
+PYTHON_BIN="${PYTHON_BIN:-python3}"
+OUTPUT_DIR="$SCRIPT_DIR/results"
+REPORT_DIR="$OUTPUT_DIR/report"
+PROTO_DIR="$SCRIPT_DIR/proto"
+DOCS_DIR="$SCRIPT_DIR/../../docs/benchmarks/python"
+
+DATA=""
+SERIALIZER=""
+OPERATION="all"
+WARMUP=3
+ITERATIONS=15
+REPEAT=5
+NUMBER=1000
+COPY_DOCS=true
+
+usage() {
+  cat <<'EOF'
+Usage: ./run.sh [options]
+
+Run Python comprehensive benchmarks for struct/sample/mediacontent and list 
variants.
+
+Options:
+  --data <type>          Filter by data type: 
struct,sample,mediacontent,structlist,samplelist,mediacontentlist
+  --serializer <name>    Filter by serializer: fory,pickle,protobuf
+  --operation <op>       all|serialize|deserialize (default: all)
+  --warmup <n>           Warmup iterations (default: 3)
+  --iterations <n>       Measurement iterations (default: 15)
+  --repeat <n>           Repeat count per iteration (default: 5)
+  --number <n>           Inner loop call count (default: 1000)
+  --no-copy-docs         Skip copying report/plots into docs/benchmarks/python
+  -h, --help             Show this help message
+
+Examples:
+  ./run.sh
+  ./run.sh --data struct --serializer fory
+  ./run.sh --operation serialize --iterations 30 --repeat 8
+EOF
+}
+
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --data)
+      DATA="$2"
+      shift 2
+      ;;
+    --serializer)
+      SERIALIZER="$2"
+      shift 2
+      ;;
+    --operation)
+      OPERATION="$2"
+      shift 2
+      ;;
+    --warmup)
+      WARMUP="$2"
+      shift 2
+      ;;
+    --iterations)
+      ITERATIONS="$2"
+      shift 2
+      ;;
+    --repeat)
+      REPEAT="$2"
+      shift 2
+      ;;
+    --number)
+      NUMBER="$2"
+      shift 2
+      ;;
+    --no-copy-docs)
+      COPY_DOCS=false
+      shift
+      ;;
+    -h|--help)
+      usage
+      exit 0
+      ;;
+    *)
+      echo "Unknown option: $1"
+      usage
+      exit 1
+      ;;
+  esac
+done
+
+if ! command -v "$PYTHON_BIN" >/dev/null 2>&1; then
+  echo "Error: $PYTHON_BIN is not available"
+  exit 1
+fi
+
+if ! command -v protoc >/dev/null 2>&1; then
+  echo "Error: protoc is required. Install protobuf compiler first."
+  echo "  macOS: brew install protobuf"
+  exit 1
+fi
+
+echo "============================================"
+echo "Python Comprehensive Benchmark"
+echo "============================================"
+
+echo "Checking runtime dependencies..."
+if ! "$PYTHON_BIN" -c "import pyfory" >/dev/null 2>&1; then
+  echo "Error: pyfory is not installed in current Python environment."
+  echo "Install it with: cd python && pip install -e ."
+  exit 1
+fi
+
+if ! "$PYTHON_BIN" -c "import google.protobuf" >/dev/null 2>&1; then
+  echo "Installing benchmark dependency: protobuf"
+  "$PYTHON_BIN" -m pip install protobuf
+fi
+
+if ! "$PYTHON_BIN" -c "import matplotlib, numpy" >/dev/null 2>&1; then
+  echo "Installing report dependencies: matplotlib numpy psutil"
+  "$PYTHON_BIN" -m pip install matplotlib numpy psutil
+fi
+
+mkdir -p "$PROTO_DIR" "$OUTPUT_DIR" "$REPORT_DIR"
+
+if [[ ! -f "$PROTO_DIR/__init__.py" ]]; then
+  touch "$PROTO_DIR/__init__.py"
+fi
+
+echo "Generating Python protobuf bindings..."
+protoc \
+  --proto_path="$SCRIPT_DIR/../proto" \
+  --python_out="$PROTO_DIR" \
+  "$SCRIPT_DIR/../proto/bench.proto"
+
+BENCH_JSON="$OUTPUT_DIR/benchmark_results.json"
+BENCH_CMD=(
+  "$PYTHON_BIN" "$SCRIPT_DIR/benchmark.py"
+  --proto-dir "$PROTO_DIR"
+  --output-json "$BENCH_JSON"
+  --operation "$OPERATION"
+  --warmup "$WARMUP"
+  --iterations "$ITERATIONS"
+  --repeat "$REPEAT"
+  --number "$NUMBER"
+)
+
+if [[ -n "$DATA" ]]; then
+  BENCH_CMD+=(--data "$DATA")
+fi
+if [[ -n "$SERIALIZER" ]]; then
+  BENCH_CMD+=(--serializer "$SERIALIZER")
+fi
+
+echo ""
+echo "Running benchmark..."
+"${BENCH_CMD[@]}"
+
+echo ""
+echo "Generating report..."
+"$PYTHON_BIN" "$SCRIPT_DIR/benchmark_report.py" \
+  --json-file "$BENCH_JSON" \
+  --output-dir "$REPORT_DIR"
+
+if [[ "$COPY_DOCS" == true ]]; then
+  mkdir -p "$DOCS_DIR"
+  cp "$REPORT_DIR/README.md" "$DOCS_DIR/README.md"
+  cp "$REPORT_DIR"/*.png "$DOCS_DIR/" 2>/dev/null || true
+  echo "Copied report and plots to: $DOCS_DIR"
+fi
+
+echo ""
+echo "============================================"
+echo "Benchmark complete!"
+echo "============================================"
+echo "Benchmark JSON: $BENCH_JSON"
+echo "Report: $REPORT_DIR/README.md"
+if [[ "$COPY_DOCS" == true ]]; then
+  echo "Docs sync: $DOCS_DIR"
+fi
diff --git a/docs/benchmarks/python/README.md b/docs/benchmarks/python/README.md
new file mode 100644
index 000000000..2908fb876
--- /dev/null
+++ b/docs/benchmarks/python/README.md
@@ -0,0 +1,127 @@
+# Python Benchmark Performance Report
+
+_Generated on 2026-03-03 13:42:38_
+
+## How to Generate This Report
+
+```bash
+cd benchmarks/python
+./run.sh
+```
+
+## Hardware & OS Info
+
+| Key                   | Value                        |
+| --------------------- | ---------------------------- |
+| OS                    | Darwin 24.6.0                |
+| Machine               | arm64                        |
+| Processor             | arm                          |
+| Python                | 3.10.8                       |
+| CPU Cores (Physical)  | 12                           |
+| CPU Cores (Logical)   | 12                           |
+| Total RAM (GB)        | 48.0                         |
+| Python Implementation | CPython                      |
+| Benchmark Platform    | macOS-15.7.2-arm64-arm-64bit |
+
+## Benchmark Configuration
+
+| Key        | Value |
+| ---------- | ----- |
+| warmup     | 3     |
+| iterations | 15    |
+| repeat     | 5     |
+| number     | 1000  |
+| list_size  | 5     |
+
+## Benchmark Plots
+
+All plots show throughput (ops/sec); higher is better.
+
+### Throughput
+
+<p align="center">
+<img src="throughput.png" width="90%" />
+</p>
+
+### Mediacontent
+
+<p align="center">
+<img src="mediacontent.png" width="90%" />
+</p>
+
+### Mediacontentlist
+
+<p align="center">
+<img src="mediacontentlist.png" width="90%" />
+</p>
+
+### Sample
+
+<p align="center">
+<img src="sample.png" width="90%" />
+</p>
+
+### Samplelist
+
+<p align="center">
+<img src="samplelist.png" width="90%" />
+</p>
+
+### Struct
+
+<p align="center">
+<img src="struct.png" width="90%" />
+</p>
+
+### Structlist
+
+<p align="center">
+<img src="structlist.png" width="90%" />
+</p>
+
+## Benchmark Results
+
+### Timing Results (nanoseconds)
+
+| Datatype         | Operation   | fory (ns) | pickle (ns) | protobuf (ns) | 
Fastest |
+| ---------------- | ----------- | --------- | ----------- | ------------- | 
------- |
+| Struct           | Serialize   | 417.9     | 868.9       | 548.9         | 
fory    |
+| Struct           | Deserialize | 516.1     | 910.6       | 742.4         | 
fory    |
+| Sample           | Serialize   | 828.1     | 1663.5      | 2383.7        | 
fory    |
+| Sample           | Deserialize | 1282.4    | 2296.3      | 3992.7        | 
fory    |
+| MediaContent     | Serialize   | 1139.9    | 2859.7      | 2867.1        | 
fory    |
+| MediaContent     | Deserialize | 1719.5    | 2854.3      | 3236.1        | 
fory    |
+| StructList       | Serialize   | 1009.1    | 2630.6      | 3281.6        | 
fory    |
+| StructList       | Deserialize | 1387.2    | 2651.9      | 3547.9        | 
fory    |
+| SampleList       | Serialize   | 2828.3    | 5541.0      | 15256.6       | 
fory    |
+| SampleList       | Deserialize | 5043.4    | 8144.7      | 18912.5       | 
fory    |
+| MediaContentList | Serialize   | 3417.9    | 9341.9      | 15853.2       | 
fory    |
+| MediaContentList | Deserialize | 6138.7    | 8435.3      | 16442.6       | 
fory    |
+
+### Throughput Results (ops/sec)
+
+| Datatype         | Operation   | fory TPS  | pickle TPS | protobuf TPS | 
Fastest |
+| ---------------- | ----------- | --------- | ---------- | ------------ | 
------- |
+| Struct           | Serialize   | 2,393,086 | 1,150,946  | 1,821,982    | 
fory    |
+| Struct           | Deserialize | 1,937,707 | 1,098,170  | 1,346,915    | 
fory    |
+| Sample           | Serialize   | 1,207,542 | 601,144    | 419,511      | 
fory    |
+| Sample           | Deserialize | 779,789   | 435,489    | 250,460      | 
fory    |
+| MediaContent     | Serialize   | 877,300   | 349,688    | 348,780      | 
fory    |
+| MediaContent     | Deserialize | 581,563   | 350,354    | 309,018      | 
fory    |
+| StructList       | Serialize   | 991,017   | 380,145    | 304,732      | 
fory    |
+| StructList       | Deserialize | 720,901   | 377,081    | 281,855      | 
fory    |
+| SampleList       | Serialize   | 353,574   | 180,473    | 65,545       | 
fory    |
+| SampleList       | Deserialize | 198,280   | 122,780    | 52,875       | 
fory    |
+| MediaContentList | Serialize   | 292,578   | 107,045    | 63,079       | 
fory    |
+| MediaContentList | Deserialize | 162,902   | 118,550    | 60,818       | 
fory    |
+
+### Serialized Data Sizes (bytes)
+
+| Datatype         | fory | pickle | protobuf |
+| ---------------- | ---- | ------ | -------- |
+| Struct           | 72   | 126    | 61       |
+| Sample           | 517  | 793    | 375      |
+| MediaContent     | 470  | 586    | 301      |
+| StructList       | 205  | 420    | 315      |
+| SampleList       | 1810 | 2539   | 1890     |
+| MediaContentList | 1756 | 1377   | 1520     |
diff --git a/docs/benchmarks/python/mediacontent.png 
b/docs/benchmarks/python/mediacontent.png
new file mode 100644
index 000000000..05b28cd20
Binary files /dev/null and b/docs/benchmarks/python/mediacontent.png differ
diff --git a/docs/benchmarks/python/mediacontentlist.png 
b/docs/benchmarks/python/mediacontentlist.png
new file mode 100644
index 000000000..6ca7b1814
Binary files /dev/null and b/docs/benchmarks/python/mediacontentlist.png differ
diff --git a/docs/benchmarks/python/sample.png 
b/docs/benchmarks/python/sample.png
new file mode 100644
index 000000000..eb318e1ab
Binary files /dev/null and b/docs/benchmarks/python/sample.png differ
diff --git a/docs/benchmarks/python/samplelist.png 
b/docs/benchmarks/python/samplelist.png
new file mode 100644
index 000000000..94896bcab
Binary files /dev/null and b/docs/benchmarks/python/samplelist.png differ
diff --git a/docs/benchmarks/python/struct.png 
b/docs/benchmarks/python/struct.png
new file mode 100644
index 000000000..c8fe09cfe
Binary files /dev/null and b/docs/benchmarks/python/struct.png differ
diff --git a/docs/benchmarks/python/structlist.png 
b/docs/benchmarks/python/structlist.png
new file mode 100644
index 000000000..e421b1738
Binary files /dev/null and b/docs/benchmarks/python/structlist.png differ
diff --git a/docs/benchmarks/python/throughput.png 
b/docs/benchmarks/python/throughput.png
new file mode 100644
index 000000000..c750a9ffb
Binary files /dev/null and b/docs/benchmarks/python/throughput.png differ
diff --git a/python/pyfory/serialization.pyx b/python/pyfory/serialization.pyx
index 9b8c73098..e28aa06da 100644
--- a/python/pyfory/serialization.pyx
+++ b/python/pyfory/serialization.pyx
@@ -48,7 +48,7 @@ from libc.stdint cimport *
 from libcpp.vector cimport vector
 from libcpp.memory cimport shared_ptr
 from cpython cimport PyObject
-from cpython.object cimport PyTypeObject
+from cpython.object cimport PyTypeObject, PyObject_GetAttr, PyObject_SetAttr
 from cpython.dict cimport PyDict_Next
 from cpython.ref cimport *
 from cpython.list cimport PyList_New, PyList_SET_ITEM
diff --git a/python/pyfory/struct.pxi b/python/pyfory/struct.pxi
index 63a3871cc..2e024b3fa 100644
--- a/python/pyfory/struct.pxi
+++ b/python/pyfory/struct.pxi
@@ -296,18 +296,11 @@ cdef class DataClassSerializer(Serializer):
         cdef object field_name
         cdef FieldRuntimeInfo *field_info
 
-        if self.fory.compatible:
-            for i in range(field_count):
-                field_info = &self._field_runtime_infos[i]
-                field_name = <object> field_info.field_name
-                field_value = value_dict.get(field_name)
-                self._write_field_value(buffer, field_info, field_value)
-        else:
-            for i in range(field_count):
-                field_info = &self._field_runtime_infos[i]
-                field_name = <object> field_info.field_name
-                field_value = value_dict[field_name]
-                self._write_field_value(buffer, field_info, field_value)
+        for i in range(field_count):
+            field_info = &self._field_runtime_infos[i]
+            field_name = <object> field_info.field_name
+            field_value = value_dict[field_name]
+            self._write_field_value(buffer, field_info, field_value)
 
     cdef inline void _write_slots(self, Buffer buffer, object value):
         cdef Py_ssize_t i
@@ -320,13 +313,13 @@ cdef class DataClassSerializer(Serializer):
             for i in range(field_count):
                 field_info = &self._field_runtime_infos[i]
                 field_name = <object> field_info.field_name
-                field_value = getattr(value, field_name, None)
+                field_value = PyObject_GetAttr(value, field_name)
                 self._write_field_value(buffer, field_info, field_value)
         else:
             for i in range(field_count):
                 field_info = &self._field_runtime_infos[i]
                 field_name = <object> field_info.field_name
-                field_value = getattr(value, field_name)
+                field_value = PyObject_GetAttr(value, field_name)
                 self._write_field_value(buffer, field_info, field_value)
 
     cdef inline void _write_field_value(self, Buffer buffer, FieldRuntimeInfo 
*field_info, object field_value):
@@ -421,7 +414,7 @@ cdef class DataClassSerializer(Serializer):
             if field_info.field_exists == 0:
                 continue
             field_name = <object> field_info.field_name
-            setattr(obj, field_name, field_value)
+            PyObject_SetAttr(obj, field_name, field_value)
 
     cdef inline object _read_field_value(self, Buffer buffer, FieldRuntimeInfo 
*field_info):
         cdef uint8_t type_id = field_info.basic_type_id
@@ -460,4 +453,4 @@ cdef class DataClassSerializer(Serializer):
         cdef object default_factory
 
         for field_name, default_factory in self._missing_field_defaults:
-            setattr(obj, field_name, default_factory())
+            PyObject_SetAttr(obj, field_name, default_factory())


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(fory) branch main updated: perf(python): add python benchmark suite (#3448)

Reply via email to