This is an automated email from the ASF dual-hosted git repository.
chaokunyang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/fory.git
The following commit(s) were added to refs/heads/main by this push:
new cc934c07c perf(rust): optimize rust small string/struct read/write
performance (#2803)
cc934c07c is described below
commit cc934c07ca70d78071c0d94f351533a6e69af09e
Author: Shawn Yang <[email protected]>
AuthorDate: Wed Oct 22 16:42:08 2025 +0800
perf(rust): optimize rust small string/struct read/write performance (#2803)
## Why?
<!-- Describe the purpose of this PR. -->
## What does this PR do?
- optimize rust string read/write performance
- add inline hints to optimize small struct serialize performance
## Related issues
Closes #2802
## Does this PR introduce any user-facing change?
<!--
If any user-facing interface changes, please [open an
issue](https://github.com/apache/fory/issues/new/choose) describing the
need to do so and update the document if necessary.
Delete section if not applicable.
-->
- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?
## Benchmark
# SimpleStruct Comparison Performance Report
This compares **Fory**, **Protobuf**, and **JSON** across **serialize**
and **deserialize** for **small**, **medium**, and **large** payloads.
---
## 1. Serialization (Time in ns, Lower = Better)
| Size | Fory | Protobuf | JSON | Fastest | Change Summary |
|----------|-------------|-------------|-------------|-------------|----------------|
| Small | **125.78** | 187.98 | 225.71 | Fory | Fory ↑, Protobuf ↓, JSON
↑ |
| Medium | **127.99** | 250.21 | 250.61 | Fory | Fory ↑, Protobuf ↓,
JSON ↓ |
| Large | **153.31** | 247.91 | 598.14 | Fory | Fory ↑, Protobuf ↑, JSON
↓ |
**Note:** “↑” = improved performance (faster), “↓” = regression
(slower).
---
## 2. Deserialization (Time in ns, Lower = Better)
| Size | Fory | Protobuf | JSON | Fastest | Change Summary |
|----------|--------------|-------------|-------------|-------------|----------------|
| Small | 163.28 | **100.94** | 247.23 | Protobuf | Fory ↑, Protobuf ↓,
JSON ↓ |
| Medium | 175.83 | **93.52** | 271.57 | Protobuf | Fory ↑, Protobuf ↔,
JSON ↔ |
| Large | 175.66 | **107.36** | 350.12 | Protobuf | Fory ↑, Protobuf ↔,
JSON ↔ |
**Note:** “↔” = no significant change.
---
## 3. Overall Trends
### **Fory**
- **Serialization:** Consistently fastest in all sizes, **huge gains**
(up to ~80% faster on large payloads).
- **Deserialization:** Slower than Protobuf but **significant
improvements** (up to ~46% faster compared to previous run).
### **Protobuf**
- **Serialization:** Slower than Fory, **regressed** for small & medium,
slightly improved for large.
- **Deserialization:** Fastest in all sizes (especially for small
payloads), mostly unchanged except small case regressed.
### **JSON**
- **Serialization:** Always slowest, small improved, medium & large
regressed.
- **Deserialization:** Always slowest, mostly unchanged, small case
regressed.
---
## 4. Key Takeaways
1. **Fory is now clearly the best choice for serialization speed**
across all payload sizes.
2. **Protobuf retains the crown for deserialization speed**, especially
for small and medium payloads.
3. **JSON remains the slowest** in both serialization and
deserialization, and showed regression in many cases.
4. For workloads that serialize often: use **Fory**.
5. For workloads that deserialize small payloads often: **Protobuf**
still wins.
# Ecommerce Data Serialization/Deserialization Performance Report
## Serialize Performance (lower is better)
| Size | Fory Serialize | Protobuf Serialize | JSON Serialize | Fastest
|
|--------|--------------------|--------------------|----------------------|---------|
| Small | **0.935 µs** | 7.37 µs | 9.74 µs | Fory |
| Medium | **34.86 µs** | 421.91 µs | 485.89 µs | Fory |
| Large | **665.25 µs** | 10.971 ms | 8.0948 ms | Fory |
## Deserialize Performance (lower is better)
| Size | Fory Deserialize | Protobuf Deserialize | JSON Deserialize |
Fastest |
|--------|--------------------|----------------------|---------------------|---------|
| Small | **7.6366 µs** | 9.1811 µs | 14.086 µs | Fory |
| Medium | **404.89 µs** | 606.06 µs | 719.57 µs | Fory |
| Large | **6.4556 ms** | 10.544 ms | 11.479 ms | Fory |
---
## Observations
1. **Fory** outperforms Protobuf and JSON in **all cases**, both
serialization and deserialization.
2. Performance gap is especially large for medium and large datasets
where Fory is:
- ~12× faster than Protobuf serialization for large data.
- ~1.7× faster than JSON serialization for large data.
3. Small dataset serialization for Fory is **extremely** fast (~0.935
µs) compared to Protobuf (~7.37 µs) and JSON (~9.74 µs).
---
## Relative Speedups (Fory vs others)
### Serialize
- Small: Fory vs Protobuf → **7.9× faster**
- Medium: Fory vs Protobuf → **12× faster**
- Large: Fory vs Protobuf → **16.5× faster**
- Small: Fory vs JSON → **10.4× faster**
- Medium: Fory vs JSON → **13.9× faster**
- Large: Fory vs JSON → **12.2× faster**
### Deserialize
- Small: Fory vs Protobuf → **1.20× faster**
- Medium: Fory vs Protobuf → **1.50× faster**
- Large: Fory vs Protobuf → **1.63× faster**
- Small: Fory vs JSON → **1.84× faster**
- Medium: Fory vs JSON → **1.78× faster**
- Large: Fory vs JSON → **1.78× faster**
---
## Conclusion
The **Fory** format is consistently the fastest across all dataset sizes
and both serialization/deserialization.
For performance‑critical ecommerce data pipelines, replacing Protobuf
and JSON with Fory could yield **substantial latency reductions**,
especially in large dataset scenarios.
---
rust/fory-core/src/buffer.rs | 125 +++++++++++++++++++-
rust/fory-core/src/fory.rs | 7 +-
rust/fory-core/src/meta/string_util.rs | 164 +++++++++++++++++++--------
rust/fory-core/src/resolver/ref_resolver.rs | 14 +++
rust/fory-core/src/resolver/type_resolver.rs | 21 ++++
rust/fory-core/src/serializer/core.rs | 5 +
rust/fory-core/src/serializer/number.rs | 18 +--
rust/fory-core/src/serializer/string.rs | 21 +++-
rust/fory-derive/src/object/serializer.rs | 15 +++
9 files changed, 327 insertions(+), 63 deletions(-)
diff --git a/rust/fory-core/src/buffer.rs b/rust/fory-core/src/buffer.rs
index d38bf4fcb..d5c00cb91 100644
--- a/rust/fory-core/src/buffer.rs
+++ b/rust/fory-core/src/buffer.rs
@@ -23,6 +23,10 @@ use crate::meta::buffer_rw_string::{
use byteorder::{ByteOrder, LittleEndian, WriteBytesExt};
use std::slice;
+/// Threshold for using SIMD optimizations in string operations.
+/// For buffers smaller than this, direct copy is faster than SIMD setup
overhead.
+const SIMD_THRESHOLD: usize = 128;
+
#[derive(Default)]
pub struct Writer {
pub(crate) bf: Vec<u8>,
@@ -325,16 +329,59 @@ impl Writer {
#[inline(always)]
pub fn write_latin1_string(&mut self, s: &str) {
+ if s.len() < SIMD_THRESHOLD {
+ // Fast path for small buffers
+ let bytes = s.as_bytes();
+ // CRITICAL: Only safe if ASCII (UTF-8 == Latin1 for ASCII)
+ let is_ascii = bytes.iter().all(|&b| b < 0x80);
+ if is_ascii {
+ self.bf.reserve(s.len());
+ self.bf.extend_from_slice(bytes);
+ } else {
+ // Non-ASCII: must iterate chars to extract Latin1 byte values
+ self.bf.reserve(s.len());
+ for c in s.chars() {
+ let v = c as u32;
+ assert!(v <= 0xFF, "Non-Latin1 character found");
+ self.bf.push(v as u8);
+ }
+ }
+ return;
+ }
write_latin1_simd(self, s);
}
#[inline(always)]
pub fn write_utf8_string(&mut self, s: &str) {
- write_utf8_simd(self, s);
+ let bytes = s.as_bytes();
+ let len = bytes.len();
+
+ if len < SIMD_THRESHOLD {
+ // Fast path for small strings - direct copy avoids SIMD overhead
+ // For small strings, the branch cost + simple copy is faster than
SIMD setup
+ self.bf.reserve(len);
+ self.bf.extend_from_slice(bytes);
+ } else {
+ // Use SIMD for larger strings where the overhead is amortized
+ write_utf8_simd(self, s);
+ }
}
#[inline(always)]
pub fn write_utf16_bytes(&mut self, bytes: &[u16]) {
+ let total_bytes = bytes.len() * 2;
+ if total_bytes < SIMD_THRESHOLD {
+ // Fast path for small UTF-16 data - direct copy
+ let old_len = self.bf.len();
+ self.bf.reserve(total_bytes);
+ unsafe {
+ let dest = self.bf.as_mut_ptr().add(old_len);
+ let src = bytes.as_ptr() as *const u8;
+ std::ptr::copy_nonoverlapping(src, dest, total_bytes);
+ self.bf.set_len(old_len + total_bytes);
+ }
+ return;
+ }
write_utf16_simd(self, bytes);
}
}
@@ -617,18 +664,90 @@ impl Reader {
#[inline(always)]
pub fn read_latin1_string(&mut self, len: usize) -> Result<String, Error> {
self.check_bound(len)?;
- read_latin1_simd(self, len)
+ if len < SIMD_THRESHOLD {
+ // Fast path for small buffers
+ unsafe {
+ let src = std::slice::from_raw_parts(self.bf.add(self.cursor),
len);
+
+ // Check if all bytes are ASCII (< 0x80)
+ let is_ascii = src.iter().all(|&b| b < 0x80);
+
+ if is_ascii {
+ // ASCII fast path: Latin1 == UTF-8, direct copy
+ let mut vec = Vec::with_capacity(len);
+ let dst = vec.as_mut_ptr();
+ std::ptr::copy_nonoverlapping(src.as_ptr(), dst, len);
+ vec.set_len(len);
+ self.move_next(len);
+ Ok(String::from_utf8_unchecked(vec))
+ } else {
+ // Contains Latin1 bytes (0x80-0xFF): must convert to UTF-8
+ let mut out: Vec<u8> = Vec::with_capacity(len * 2);
+ let out_ptr = out.as_mut_ptr();
+ let mut out_len = 0;
+
+ for &b in src {
+ if b < 0x80 {
+ *out_ptr.add(out_len) = b;
+ out_len += 1;
+ } else {
+ // Latin1 -> UTF-8 encoding
+ *out_ptr.add(out_len) = 0xC0 | (b >> 6);
+ *out_ptr.add(out_len + 1) = 0x80 | (b & 0x3F);
+ out_len += 2;
+ }
+ }
+
+ out.set_len(out_len);
+ self.move_next(len);
+ Ok(String::from_utf8_unchecked(out))
+ }
+ }
+ } else {
+ // Use SIMD for larger strings where the overhead is amortized
+ read_latin1_simd(self, len)
+ }
}
#[inline(always)]
pub fn read_utf8_string(&mut self, len: usize) -> Result<String, Error> {
self.check_bound(len)?;
- read_utf8_simd(self, len)
+
+ if len < SIMD_THRESHOLD {
+ // Fast path for small strings - direct copy avoids SIMD overhead
+ // SAFETY: bounds already checked, assuming valid UTF-8 (caller's
responsibility)
+ unsafe {
+ let mut vec = Vec::with_capacity(len);
+ let src = self.bf.add(self.cursor);
+ let dst = vec.as_mut_ptr();
+ // Use fastest possible copy - copy_nonoverlapping compiles to
memcpy
+ std::ptr::copy_nonoverlapping(src, dst, len);
+ vec.set_len(len);
+ self.move_next(len);
+ // SAFETY: Assuming valid UTF-8 bytes (responsibility of
serialization protocol)
+ Ok(String::from_utf8_unchecked(vec))
+ }
+ } else {
+ // Use SIMD for larger strings where the overhead is amortized
+ read_utf8_simd(self, len)
+ }
}
#[inline(always)]
pub fn read_utf16_string(&mut self, len: usize) -> Result<String, Error> {
self.check_bound(len)?;
+ if len < SIMD_THRESHOLD {
+ // Fast path for small UTF-16 strings - direct copy
+ unsafe {
+ let slice =
std::slice::from_raw_parts(self.bf.add(self.cursor), len);
+ let units: Vec<u16> = slice
+ .chunks_exact(2)
+ .map(|c| u16::from_le_bytes([c[0], c[1]]))
+ .collect();
+ self.move_next(len);
+ return Ok(String::from_utf16_lossy(&units));
+ }
+ }
read_utf16_simd(self, len)
}
diff --git a/rust/fory-core/src/fory.rs b/rust/fory-core/src/fory.rs
index 0f7528d78..b00d0e07a 100644
--- a/rust/fory-core/src/fory.rs
+++ b/rust/fory-core/src/fory.rs
@@ -90,7 +90,7 @@ impl Default for Fory {
fn default() -> Self {
Fory {
compatible: false,
- xlang: true,
+ xlang: false,
share_meta: false,
type_resolver: TypeResolver::default(),
compress_string: false,
@@ -156,7 +156,7 @@ impl Fory {
///
/// # Default
///
- /// The default value is `true`.
+ /// The default value is `false`.
///
/// # Examples
///
@@ -166,7 +166,8 @@ impl Fory {
/// // For cross-language use (default)
/// let fory = Fory::default().xlang(true);
///
- /// // For Rust-only optimization
+ /// // For Rust-only optimization, this mode is faster and more compact
since it avoids
+ /// // cross-language metadata and type system costs.
/// let fory = Fory::default().xlang(false);
/// ```
pub fn xlang(mut self, xlang: bool) -> Self {
diff --git a/rust/fory-core/src/meta/string_util.rs
b/rust/fory-core/src/meta/string_util.rs
index 4b4692d62..8eda15d57 100644
--- a/rust/fory-core/src/meta/string_util.rs
+++ b/rust/fory-core/src/meta/string_util.rs
@@ -602,12 +602,10 @@ pub mod buffer_rw_string {
#[inline]
fn write_bytes_simd(writer: &mut Writer, bytes: &[u8]) {
let len = bytes.len();
- let mut i = 0usize;
-
if len == 0 {
return;
}
-
+ let mut i = 0usize;
writer.bf.reserve(len);
#[cfg(any(
@@ -685,21 +683,83 @@ pub mod buffer_rw_string {
}
}
+ #[inline]
+ fn is_ascii_bytes(bytes: &[u8]) -> bool {
+ let len = bytes.len();
+ let mut i = 0;
+
+ #[cfg(target_arch = "x86_64")]
+ unsafe {
+ if is_x86_feature_detected!("avx2") && len >= 32 {
+ while i + 32 <= len {
+ let chunk = _mm256_loadu_si256(bytes.as_ptr().add(i) as
*const __m256i);
+ let mask = _mm256_movemask_epi8(chunk);
+ if mask != 0 {
+ return false;
+ }
+ i += 32;
+ }
+ }
+ }
+
+ #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
+ unsafe {
+ if is_x86_feature_detected!("sse2") && len >= 16 {
+ while i + 16 <= len {
+ let chunk = _mm_loadu_si128(bytes.as_ptr().add(i) as
*const __m128i);
+ let mask = _mm_movemask_epi8(chunk);
+ if mask != 0 {
+ return false;
+ }
+ i += 16;
+ }
+ }
+ }
+
+ #[cfg(target_arch = "aarch64")]
+ unsafe {
+ if std::arch::is_aarch64_feature_detected!("neon") && len >= 16 {
+ while i + 16 <= len {
+ let chunk = vld1q_u8(bytes.as_ptr().add(i));
+ if vmaxvq_u8(chunk) >= 0x80 {
+ return false;
+ }
+ i += 16;
+ }
+ }
+ }
+
+ // Scalar fallback
+ bytes[i..].iter().all(|&b| b < 0x80)
+ }
+
#[inline]
pub fn write_latin1_simd(writer: &mut Writer, s: &str) {
if s.is_empty() {
return;
}
- let mut buf: Vec<u8> = Vec::with_capacity(s.len());
- for c in s.chars() {
- let v = c as u32;
- assert!(v <= 0xFF, "Non-Latin1 character found");
- buf.push(v as u8);
+
+ let bytes = s.as_bytes();
+
+ // CRITICAL OPTIMIZATION: For ASCII strings, UTF-8 bytes == Latin1
bytes
+ // Check if all ASCII using SIMD
+ if is_ascii_bytes(bytes) {
+ // Zero-copy fast path: direct write
+ write_bytes_simd(writer, bytes);
+ } else {
+ // Non-ASCII: Must iterate chars to extract Latin1 byte values
+ // Example: 'À' in Rust String is UTF-8 [0xC3, 0x80] but Latin1 is
[0xC0]
+ let mut buf: Vec<u8> = Vec::with_capacity(s.len());
+ for c in s.chars() {
+ let v = c as u32;
+ assert!(v <= 0xFF, "Non-Latin1 character found");
+ buf.push(v as u8);
+ }
+ write_bytes_simd(writer, &buf);
}
- write_bytes_simd(writer, &buf);
}
- #[inline]
+ #[inline(always)]
pub fn write_utf8_simd(writer: &mut Writer, s: &str) {
let bytes = s.as_bytes();
write_bytes_simd(writer, bytes);
@@ -776,12 +836,15 @@ pub mod buffer_rw_string {
}
let src = unsafe {
std::slice::from_raw_parts(reader.bf.add(reader.cursor), len) };
- let mut out: Vec<u8> = Vec::with_capacity(len + len / 4);
+ // Pessimistic allocation: Latin1 0x80-0xFF expands to 2 bytes in UTF-8
+ let mut out: Vec<u8> = Vec::with_capacity(len * 2);
unsafe {
+ let out_ptr = out.as_mut_ptr();
+ let mut out_len = 0usize;
let mut i = 0usize;
- // ---- AVX2 fast-path (32 bytes) ----
+ // ---- AVX2 fast-path: process 32 ASCII bytes at once ----
#[cfg(target_arch = "x86_64")]
{
if std::arch::is_x86_feature_detected!("avx2") {
@@ -791,19 +854,20 @@ pub mod buffer_rw_string {
let chunk = _mm256_loadu_si256(ptr);
let mask = _mm256_movemask_epi8(chunk);
if mask == 0 {
- let mut buf32: [u8; 32] = std::mem::zeroed();
- _mm256_storeu_si256(buf32.as_mut_ptr() as *mut
__m256i, chunk);
- out.extend_from_slice(&buf32);
+ // All ASCII: direct copy (no conversion needed)
+ _mm256_storeu_si256(out_ptr.add(out_len) as *mut
__m256i, chunk);
+ out_len += 32;
i += 32;
continue;
} else {
+ // Contains Latin1 bytes, break to scalar
break;
}
}
}
}
- // ---- SSE2 fast-path (16 bytes) ----
+ // ---- SSE2 fast-path: process 16 ASCII bytes at once ----
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
{
if std::arch::is_x86_feature_detected!("sse2") {
@@ -813,9 +877,9 @@ pub mod buffer_rw_string {
let chunk = _mm_loadu_si128(ptr);
let mask = _mm_movemask_epi8(chunk);
if mask == 0 {
- let mut buf16: [u8; 16] = std::mem::zeroed();
- _mm_storeu_si128(buf16.as_mut_ptr() as *mut
__m128i, chunk);
- out.extend_from_slice(&buf16);
+ // All ASCII: direct copy
+ _mm_storeu_si128(out_ptr.add(out_len) as *mut
__m128i, chunk);
+ out_len += 16;
i += 16;
continue;
} else {
@@ -825,7 +889,7 @@ pub mod buffer_rw_string {
}
}
- // ---- NEON fast-path (16 bytes) ----
+ // ---- NEON fast-path: process 16 ASCII bytes at once ----
#[cfg(target_arch = "aarch64")]
{
if std::arch::is_aarch64_feature_detected!("neon") {
@@ -833,15 +897,11 @@ pub mod buffer_rw_string {
while i + 16 <= len {
let ptr = src.as_ptr().add(i);
let v = vld1q_u8(ptr);
- let cmp = vcgeq_u8(v, vdupq_n_u8(128));
-
- let mut mask_arr: [u8; 16] = std::mem::zeroed();
- vst1q_u8(mask_arr.as_mut_ptr(), cmp);
-
- if mask_arr.iter().all(|&x| x == 0) {
- let mut buf16: [u8; 16] = std::mem::zeroed();
- vst1q_u8(buf16.as_mut_ptr(), v);
- out.extend_from_slice(&buf16);
+ // Check if any byte >= 0x80
+ if vmaxvq_u8(v) < 0x80 {
+ // All ASCII: direct copy
+ vst1q_u8(out_ptr.add(out_len), v);
+ out_len += 16;
i += 16;
continue;
} else {
@@ -851,17 +911,25 @@ pub mod buffer_rw_string {
}
}
- // ---- scalar fallback for remaining bytes ----
+ // ---- Scalar fallback: convert Latin1 -> UTF-8 ----
+ // ASCII (0x00-0x7F): copy as-is
+ // Latin1 (0x80-0xFF): encode as 2-byte UTF-8
while i < len {
let b = *src.get_unchecked(i);
if b < 0x80 {
- out.push(b);
+ *out_ptr.add(out_len) = b;
+ out_len += 1;
} else {
- out.push(0xC0 | (b >> 6));
- out.push(0x80 | (b & 0x3F));
+ // Latin1 byte 0x80-0xFF -> UTF-8 encoding
+ // Example: 0xC0 (À) -> [0xC3, 0x80]
+ *out_ptr.add(out_len) = 0xC0 | (b >> 6);
+ *out_ptr.add(out_len + 1) = 0x80 | (b & 0x3F);
+ out_len += 2;
}
i += 1;
}
+
+ out.set_len(out_len);
}
reader.move_next(len);
Ok(unsafe { String::from_utf8_unchecked(out) })
@@ -872,25 +940,28 @@ pub mod buffer_rw_string {
if len == 0 {
return Ok(String::new());
}
-
let src = unsafe {
std::slice::from_raw_parts(reader.bf.add(reader.cursor), len) };
- let mut result = String::with_capacity(len);
+
+ // CRITICAL OPTIMIZATION: Allocate Vec once, SIMD copy directly,
single String construction
+ // Eliminates multiple push_str copies
+ let mut vec = Vec::with_capacity(len);
unsafe {
+ let dst: *mut u8 = vec.as_mut_ptr();
let mut i = 0usize;
+ // ---- AVX2 path: 32-byte chunks ----
#[cfg(all(target_arch = "x86_64", target_feature = "avx2"))]
{
const CHUNK: usize = 32;
while i + CHUNK <= len {
let chunk = _mm256_loadu_si256(src.as_ptr().add(i) as
*const __m256i);
- let mut buf = [0u8; CHUNK];
- _mm256_storeu_si256(buf.as_mut_ptr() as *mut __m256i,
chunk);
- result.push_str(std::str::from_utf8_unchecked(&buf));
+ _mm256_storeu_si256(dst.add(i) as *mut __m256i, chunk);
i += CHUNK;
}
}
+ // ---- SSE2 path: 16-byte chunks ----
#[cfg(all(
any(target_arch = "x86", target_arch = "x86_64"),
target_feature = "sse2",
@@ -900,32 +971,33 @@ pub mod buffer_rw_string {
const CHUNK: usize = 16;
while i + CHUNK <= len {
let chunk = _mm_loadu_si128(src.as_ptr().add(i) as *const
__m128i);
- let mut buf = [0u8; CHUNK];
- _mm_storeu_si128(buf.as_mut_ptr() as *mut __m128i, chunk);
- result.push_str(std::str::from_utf8_unchecked(&buf));
+ _mm_storeu_si128(dst.add(i) as *mut __m128i, chunk);
i += CHUNK;
}
}
+ // ---- NEON path: 16-byte chunks ----
#[cfg(all(target_arch = "aarch64", target_feature = "neon"))]
{
const CHUNK: usize = 16;
while i + CHUNK <= len {
let chunk = vld1q_u8(src.as_ptr().add(i));
- let mut buf = [0u8; CHUNK];
- vst1q_u8(buf.as_mut_ptr(), chunk);
- result.push_str(std::str::from_utf8_unchecked(&buf));
+ vst1q_u8(dst.add(i), chunk);
i += CHUNK;
}
}
+ // ---- Copy remaining bytes ----
if i < len {
- result.push_str(std::str::from_utf8_unchecked(&src[i..len]));
+ std::ptr::copy_nonoverlapping(src.as_ptr().add(i), dst.add(i),
len - i);
}
+
+ vec.set_len(len);
}
reader.move_next(len);
- Ok(result)
+ // Single String construction - no intermediate copies!
+ Ok(unsafe { String::from_utf8_unchecked(vec) })
}
#[inline]
diff --git a/rust/fory-core/src/resolver/ref_resolver.rs
b/rust/fory-core/src/resolver/ref_resolver.rs
index 84eaa4b26..951f640ef 100644
--- a/rust/fory-core/src/resolver/ref_resolver.rs
+++ b/rust/fory-core/src/resolver/ref_resolver.rs
@@ -79,6 +79,7 @@ impl RefWriter {
///
/// * `true` if a reference was written
/// * `false` if this is the first occurrence of the object
+ #[inline]
pub fn try_write_rc_ref<T: ?Sized>(&mut self, writer: &mut Writer, rc:
&Rc<T>) -> bool {
let ptr_addr = Rc::as_ptr(rc) as *const () as usize;
@@ -110,6 +111,7 @@ impl RefWriter {
///
/// * `true` if a reference was written
/// * `false` if this is the first occurrence of the object
+ #[inline]
pub fn try_write_arc_ref<T: ?Sized>(&mut self, writer: &mut Writer, arc:
&Arc<T>) -> bool {
let ptr_addr = Arc::as_ptr(arc) as *const () as usize;
@@ -131,6 +133,7 @@ impl RefWriter {
/// Clear all stored references.
///
/// This is useful for reusing the RefWriter for multiple serialization
operations.
+ #[inline(always)]
pub fn reset(&mut self) {
self.refs.clear();
self.next_ref_id = 0;
@@ -181,6 +184,7 @@ impl RefReader {
/// Reserve a reference ID slot without storing anything yet.
///
/// Returns the reserved reference ID that will be used when storing the
object later.
+ #[inline(always)]
pub fn reserve_ref_id(&mut self) -> u32 {
let ref_id = self.refs.len() as u32;
self.refs.push(Box::new(()));
@@ -193,6 +197,7 @@ impl RefReader {
///
/// * `ref_id` - The reference ID that was reserved
/// * `rc` - The Rc to store
+ #[inline(always)]
pub fn store_rc_ref_at<T: 'static + ?Sized>(&mut self, ref_id: u32, rc:
Rc<T>) {
self.refs[ref_id as usize] = Box::new(rc);
}
@@ -206,6 +211,7 @@ impl RefReader {
/// # Returns
///
/// The reference ID that can be used to retrieve this object later
+ #[inline(always)]
pub fn store_rc_ref<T: 'static + ?Sized>(&mut self, rc: Rc<T>) -> u32 {
let ref_id = self.refs.len() as u32;
self.refs.push(Box::new(rc));
@@ -231,6 +237,7 @@ impl RefReader {
/// # Returns
///
/// The reference ID that can be used to retrieve this object later
+ #[inline(always)]
pub fn store_arc_ref<T: 'static + ?Sized>(&mut self, arc: Arc<T>) -> u32 {
let ref_id = self.refs.len() as u32;
self.refs.push(Box::new(arc));
@@ -247,6 +254,7 @@ impl RefReader {
///
/// * `Some(Rc<T>)` if the reference ID is valid and the type matches
/// * `None` if the reference ID is invalid or the type doesn't match
+ #[inline(always)]
pub fn get_rc_ref<T: 'static + ?Sized>(&self, ref_id: u32) ->
Option<Rc<T>> {
let any_box = self.refs.get(ref_id as usize)?;
any_box.downcast_ref::<Rc<T>>().cloned()
@@ -262,6 +270,7 @@ impl RefReader {
///
/// * `Some(Arc<T>)` if the reference ID is valid and the type matches
/// * `None` if the reference ID is invalid or the type doesn't match
+ #[inline(always)]
pub fn get_arc_ref<T: 'static + ?Sized>(&self, ref_id: u32) ->
Option<Arc<T>> {
let any_box = self.refs.get(ref_id as usize)?;
any_box.downcast_ref::<Arc<T>>().cloned()
@@ -272,6 +281,7 @@ impl RefReader {
/// # Arguments
///
/// * `callback` - A closure that takes a reference to the RefReader
+ #[inline(always)]
pub fn add_callback(&mut self, callback: UpdateCallback) {
self.callbacks.push(callback);
}
@@ -289,6 +299,7 @@ impl RefReader {
/// # Errors
///
/// Errors if an invalid reference flag value is encountered
+ #[inline(always)]
pub fn read_ref_flag(&self, reader: &mut Reader) -> Result<RefFlag, Error>
{
let flag_value = reader.read_i8()?;
Ok(match flag_value {
@@ -312,6 +323,7 @@ impl RefReader {
/// # Returns
///
/// The reference ID as a u32
+ #[inline(always)]
pub fn read_ref_id(&self, reader: &mut Reader) -> Result<u32, Error> {
reader.read_varuint32()
}
@@ -320,6 +332,7 @@ impl RefReader {
///
/// This should be called after deserialization completes to update any
weak pointers
/// that referenced objects which were not yet available during
deserialization.
+ #[inline(always)]
pub fn resolve_callbacks(&mut self) {
let callbacks = std::mem::take(&mut self.callbacks);
for callback in callbacks {
@@ -330,6 +343,7 @@ impl RefReader {
/// Clear all stored references and callbacks.
///
/// This is useful for reusing the RefReader for multiple deserialization
operations.
+ #[inline(always)]
pub fn reset(&mut self) {
self.resolve_callbacks();
self.refs.clear();
diff --git a/rust/fory-core/src/resolver/type_resolver.rs
b/rust/fory-core/src/resolver/type_resolver.rs
index c527080c2..3f39aa228 100644
--- a/rust/fory-core/src/resolver/type_resolver.rs
+++ b/rust/fory-core/src/resolver/type_resolver.rs
@@ -69,22 +69,27 @@ impl Harness {
}
}
+ #[inline(always)]
pub fn get_write_fn(&self) -> WriteFn {
self.write_fn
}
+ #[inline(always)]
pub fn get_read_fn(&self) -> ReadFn {
self.read_fn
}
+ #[inline(always)]
pub fn get_write_data_fn(&self) -> WriteDataFn {
self.write_data_fn
}
+ #[inline(always)]
pub fn get_read_data_fn(&self) -> ReadDataFn {
self.read_data_fn
}
+ #[inline(always)]
pub fn get_to_serializer(&self) -> ToSerializerFn {
self.to_serializer
}
@@ -186,30 +191,37 @@ impl TypeInfo {
})
}
+ #[inline(always)]
pub fn get_type_id(&self) -> u32 {
self.type_id
}
+ #[inline(always)]
pub fn get_namespace(&self) -> Rc<MetaString> {
self.namespace.clone()
}
+ #[inline(always)]
pub fn get_type_name(&self) -> Rc<MetaString> {
self.type_name.clone()
}
+ #[inline(always)]
pub fn get_type_def(&self) -> Rc<Vec<u8>> {
self.type_def.clone()
}
+ #[inline(always)]
pub fn get_type_meta(&self) -> Rc<TypeMeta> {
self.type_meta.clone()
}
+ #[inline(always)]
pub fn is_registered_by_name(&self) -> bool {
self.register_by_name
}
+ #[inline(always)]
pub fn get_harness(&self) -> &Harness {
&self.harness
}
@@ -335,16 +347,19 @@ impl TypeResolver {
.cloned()
}
+ #[inline(always)]
pub fn get_type_info_by_id(&self, id: u32) -> Option<Rc<TypeInfo>> {
self.type_info_map_by_id.get(&id).cloned()
}
+ #[inline(always)]
pub fn get_type_info_by_name(&self, namespace: &str, type_name: &str) ->
Option<Rc<TypeInfo>> {
self.type_info_map_by_name
.get(&(namespace.to_owned(), type_name.to_owned()))
.cloned()
}
+ #[inline(always)]
pub fn get_type_info_by_msname(
&self,
namespace: Rc<MetaString>,
@@ -356,6 +371,7 @@ impl TypeResolver {
}
/// Fast path for getting type info by numeric ID (avoids HashMap lookup
by TypeId)
+ #[inline(always)]
pub fn get_type_id(&self, type_id: &std::any::TypeId, id: u32) ->
Result<u32, Error> {
let id_usize = id as usize;
if id_usize < self.type_id_index.len() {
@@ -370,12 +386,14 @@ impl TypeResolver {
)))
}
+ #[inline(always)]
pub fn get_harness(&self, id: u32) -> Option<Rc<Harness>> {
self.type_info_map_by_id
.get(&id)
.map(|info| Rc::new(info.get_harness().clone()))
}
+ #[inline(always)]
pub fn get_name_harness(
&self,
namespace: Rc<MetaString>,
@@ -387,6 +405,7 @@ impl TypeResolver {
.map(|info| Rc::new(info.get_harness().clone()))
}
+ #[inline(always)]
pub fn get_ext_harness(&self, id: u32) -> Result<Rc<Harness>, Error> {
self.type_info_map_by_id
.get(&id)
@@ -394,6 +413,7 @@ impl TypeResolver {
.ok_or_else(|| Error::type_error("ext type must be registered in
both peers"))
}
+ #[inline(always)]
pub fn get_ext_name_harness(
&self,
namespace: Rc<MetaString>,
@@ -406,6 +426,7 @@ impl TypeResolver {
.ok_or_else(|| Error::type_error("named_ext type must be
registered in both peers"))
}
+ #[inline(always)]
pub fn get_fory_type_id(&self, rust_type_id: std::any::TypeId) ->
Option<u32> {
self.type_info_map
.get(&rust_type_id)
diff --git a/rust/fory-core/src/serializer/core.rs
b/rust/fory-core/src/serializer/core.rs
index 1ab796b5e..72e329afb 100644
--- a/rust/fory-core/src/serializer/core.rs
+++ b/rust/fory-core/src/serializer/core.rs
@@ -246,6 +246,7 @@ pub trait Serializer: 'static {
/// [`fory_write_data`]: Serializer::fory_write_data
/// [`fory_write_type_info`]: Serializer::fory_write_type_info
/// [`fory_write_data_generic`]: Serializer::fory_write_data_generic
+ #[inline(always)]
fn fory_write(
&self,
context: &mut WriteContext,
@@ -304,6 +305,7 @@ pub trait Serializer: 'static {
/// - Focus on implementing [`fory_write_data`] for custom types
///
/// [`fory_write_data`]: Serializer::fory_write_data
+ #[inline(always)]
#[allow(unused_variables)]
fn fory_write_data_generic(
&self,
@@ -574,6 +576,7 @@ pub trait Serializer: 'static {
/// [`fory_read_data`]: Serializer::fory_read_data
/// [`fory_read_type_info`]: Serializer::fory_read_type_info
/// [`fory_write`]: Serializer::fory_write
+ #[inline(always)]
fn fory_read(
context: &mut ReadContext,
read_ref_info: bool,
@@ -658,6 +661,7 @@ pub trait Serializer: 'static {
/// - User types with custom serialization rarely need to override this
///
/// [`fory_read`]: Serializer::fory_read
+ #[inline(always)]
#[allow(unused_variables)]
fn fory_read_with_type_info(
context: &mut ReadContext,
@@ -1363,6 +1367,7 @@ pub trait StructSerializer: Serializer + 'static {
/// - Default delegates to `struct_::actual_type_id`
/// - Handles type ID transformations for compatibility
/// - **Do not override** for user types with custom serialization (EXT
types)
+ #[inline(always)]
fn fory_actual_type_id(type_id: u32, register_by_name: bool, compatible:
bool) -> u32 {
struct_::actual_type_id(type_id, register_by_name, compatible)
}
diff --git a/rust/fory-core/src/serializer/number.rs
b/rust/fory-core/src/serializer/number.rs
index f5d53754d..4e6a45b0c 100644
--- a/rust/fory-core/src/serializer/number.rs
+++ b/rust/fory-core/src/serializer/number.rs
@@ -27,53 +27,55 @@ use crate::types::TypeId;
macro_rules! impl_num_serializer {
($ty:ty, $writer:expr, $reader:expr, $field_type:expr) => {
impl Serializer for $ty {
- #[inline]
+ #[inline(always)]
fn fory_write_data(&self, context: &mut WriteContext) ->
Result<(), Error> {
$writer(&mut context.writer, *self);
Ok(())
}
- #[inline]
+ #[inline(always)]
fn fory_read_data(context: &mut ReadContext) -> Result<Self,
Error> {
$reader(&mut context.reader)
}
- #[inline]
+ #[inline(always)]
fn fory_reserved_space() -> usize {
std::mem::size_of::<$ty>()
}
- #[inline]
+ #[inline(always)]
fn fory_get_type_id(_: &TypeResolver) -> Result<u32, Error> {
Ok($field_type as u32)
}
+ #[inline(always)]
fn fory_type_id_dyn(&self, _: &TypeResolver) -> Result<u32, Error>
{
Ok($field_type as u32)
}
+ #[inline(always)]
fn fory_static_type_id() -> TypeId {
$field_type
}
- #[inline]
+ #[inline(always)]
fn as_any(&self) -> &dyn std::any::Any {
self
}
- #[inline]
+ #[inline(always)]
fn fory_write_type_info(context: &mut WriteContext) -> Result<(),
Error> {
context.writer.write_varuint32($field_type as u32);
Ok(())
}
- #[inline]
+ #[inline(always)]
fn fory_read_type_info(context: &mut ReadContext) -> Result<(),
Error> {
read_basic_type_info::<Self>(context)
}
}
impl ForyDefault for $ty {
- #[inline]
+ #[inline(always)]
fn fory_default() -> Self {
0 as $ty
}
diff --git a/rust/fory-core/src/serializer/string.rs
b/rust/fory-core/src/serializer/string.rs
index d6b7c2fa6..be457cbfb 100644
--- a/rust/fory-core/src/serializer/string.rs
+++ b/rust/fory-core/src/serializer/string.rs
@@ -32,8 +32,16 @@ enum StrEncoding {
}
impl Serializer for String {
- #[inline]
+ #[inline(always)]
fn fory_write_data(&self, context: &mut WriteContext) -> Result<(), Error>
{
+ if !context.is_xlang() {
+ // Fast path: non-xlang mode always uses UTF-8 without encoding
header
+ context.writer.write_varuint32(self.len() as u32);
+ context.writer.write_utf8_string(self);
+ return Ok(());
+ }
+
+ // xlang mode: use encoding header for optimal format selection
let mut len = get_latin1_length(self);
if len >= 0 {
let bitor = (len as u64) << 2 | StrEncoding::Latin1 as u64;
@@ -54,8 +62,15 @@ impl Serializer for String {
Ok(())
}
- #[inline]
+ #[inline(always)]
fn fory_read_data(context: &mut ReadContext) -> Result<Self, Error> {
+ if !context.is_xlang() {
+ // Fast path: non-xlang mode always uses UTF-8 without encoding
header
+ let len = context.reader.read_varuint32()? as usize;
+ return context.reader.read_utf8_string(len);
+ }
+
+ // xlang mode: read encoding header and decode accordingly
let bitor = context.reader.read_varuint36small()?;
let len = bitor >> 2;
let encoding = bitor & 0b11;
@@ -78,7 +93,7 @@ impl Serializer for String {
Ok(s)
}
- #[inline]
+ #[inline(always)]
fn fory_reserved_space() -> usize {
mem::size_of::<i32>()
}
diff --git a/rust/fory-derive/src/object/serializer.rs
b/rust/fory-derive/src/object/serializer.rs
index 20893654c..29ed1b760 100644
--- a/rust/fory-derive/src/object/serializer.rs
+++ b/rust/fory-derive/src/object/serializer.rs
@@ -125,10 +125,12 @@ pub fn derive_serializer(ast: &syn::DeriveInput,
debug_enabled: bool) -> TokenSt
#default_impl
impl fory_core::StructSerializer for #name {
+ #[inline(always)]
fn fory_type_index() -> u32 {
#type_idx
}
+ #[inline(always)]
fn fory_actual_type_id(type_id: u32, register_by_name: bool,
compatible: bool) -> u32 {
#actual_type_id_ts
}
@@ -141,24 +143,29 @@ pub fn derive_serializer(ast: &syn::DeriveInput,
debug_enabled: bool) -> TokenSt
#fields_info_ts
}
+ #[inline]
fn fory_read_compatible(context: &mut
fory_core::resolver::context::ReadContext, type_info:
std::rc::Rc<fory_core::TypeInfo>) -> Result<Self, fory_core::error::Error> {
#read_compatible_ts
}
}
impl fory_core::Serializer for #name {
+ #[inline(always)]
fn fory_get_type_id(type_resolver:
&fory_core::resolver::type_resolver::TypeResolver) -> Result<u32,
fory_core::error::Error> {
type_resolver.get_type_id(&std::any::TypeId::of::<Self>(),
#type_idx)
}
+ #[inline(always)]
fn fory_type_id_dyn(&self, type_resolver:
&fory_core::resolver::type_resolver::TypeResolver) -> Result<u32,
fory_core::error::Error> {
Self::fory_get_type_id(type_resolver)
}
+ #[inline(always)]
fn as_any(&self) -> &dyn std::any::Any {
self
}
+ #[inline(always)]
fn fory_static_type_id() -> fory_core::TypeId
where
Self: Sized,
@@ -166,34 +173,42 @@ pub fn derive_serializer(ast: &syn::DeriveInput,
debug_enabled: bool) -> TokenSt
#static_type_id_ts
}
+ #[inline(always)]
fn fory_reserved_space() -> usize {
#reserved_space_ts
}
+ #[inline(always)]
fn fory_write(&self, context: &mut
fory_core::resolver::context::WriteContext, write_ref_info: bool,
write_type_info: bool, _: bool) -> Result<(), fory_core::error::Error> {
#write_ts
}
+ #[inline]
fn fory_write_data(&self, context: &mut
fory_core::resolver::context::WriteContext) -> Result<(),
fory_core::error::Error> {
#write_data_ts
}
+ #[inline(always)]
fn fory_write_type_info(context: &mut
fory_core::resolver::context::WriteContext) -> Result<(),
fory_core::error::Error> {
#write_type_info_ts
}
+ #[inline(always)]
fn fory_read(context: &mut
fory_core::resolver::context::ReadContext, read_ref_info: bool, read_type_info:
bool) -> Result<Self, fory_core::error::Error> {
#read_ts
}
+ #[inline(always)]
fn fory_read_with_type_info(context: &mut
fory_core::resolver::context::ReadContext, read_ref_info: bool, type_info:
std::rc::Rc<fory_core::TypeInfo>) -> Result<Self, fory_core::error::Error> {
#read_with_type_info_ts
}
+ #[inline]
fn fory_read_data( context: &mut
fory_core::resolver::context::ReadContext) -> Result<Self,
fory_core::error::Error> {
#read_data_ts
}
+ #[inline(always)]
fn fory_read_type_info(context: &mut
fory_core::resolver::context::ReadContext) -> Result<(),
fory_core::error::Error> {
#read_type_info_ts
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]