chaokunyang commented on code in PR #3734: URL: https://github.com/apache/fory/pull/3734#discussion_r3418719545
########## THREAT_MODEL.md: ########## @@ -0,0 +1,182 @@ +<!-- +SPDX-License-Identifier: Apache-2.0 + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + https://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +# Apache Fory — Threat Model (v0 draft) + +## §1 Header + +- **Project:** Apache Fory (`apache/fory`), `main`, against which this draft was written. Fory is a multi-language serialization framework (Java, C++, Python, Go, Rust, JavaScript, Kotlin, Scala, Swift, Dart, C#). +- **Date:** 2026-06-02. **Status:** draft — for Apache Fory PMC review. **Author:** ASF Security team (drafted via the Scovetta threat-model rubric), for PMC ratification. +- **Version binding:** versioned with the project; a report against Fory version *N* is triaged against the model as it stood at *N*, not at HEAD. +- **Reporting cross-reference:** findings that violate a §8 property should be reported privately per the ASF process (`[email protected]` → `[email protected]`); findings under §3 or §9 are closed citing this document. +- **Provenance legend:** *(documented)* = stated in Fory's own docs/repo; *(maintainer)* = confirmed by a Fory PMC member through this process; *(inferred)* = reasoned from architecture/domain knowledge, not yet confirmed — every *(inferred)* claim has a matching §14 open question. +- **Draft confidence:** ~20 documented / 0 maintainer / ~26 inferred. +- **What Fory is:** Apache Fory is a high-performance, multi-language object/data serialization framework. An application uses it in-process to serialize its objects to bytes and deserialize bytes back into objects, either within one language ("native" mode) or across languages ("xlang" mode), with optional zero-copy and a row format. *(documented — README, docs/guide)* + +## §2 Scope and intended use + +- **Primary use:** an **in-process library** linked into a host application that calls `serialize()` / `deserialize()` on its own data types. *(documented — guides)* +- **It is not a network service or daemon.** It has no listening surface, no auth, no users — the embedding application owns where the bytes come from and go. *(inferred)* +- **Caller / trust level:** a single caller — the embedding application — which is **trusted** (it links the library and registers its types). The security-relevant question is not "who calls Fory" but **"where do the bytes handed to `deserialize()` come from"** — trusted producer, or attacker-controlled. *(inferred; the registration guidance is documented)* + +**Component-family table** *(in/out of this model):* + +| Family | Entry point | Notes | In model? | +| --- | --- | --- | --- | +| Object-graph serialization (native, per language) | `fory.serialize` / `deserialize` | the core; instantiates registered types from bytes | **In** *(documented)* | +| Cross-language (xlang) serialization | xlang `serialize`/`deserialize` | type mapping across languages | **In** *(documented)* | +| Row format / zero-copy | row encoders | reads fields in place from a buffer | **In** *(documented)* | +| Class/type registration + "secure mode" | `requireClassRegistration`, `register(...)` | the primary defense | **In** *(documented)* | +| Per-language implementations | `java/`, `cpp/`, `python/`, `go/`, `rust/`, `javascript/`, `kotlin/`, `scala/`, `swift/`, `dart/`, `csharp/` | each is a separate impl of the same model | **In** — but memory-safety profile differs by language (see §5/§8) *(documented: dirs exist)* | +| `examples/`, `benchmarks/`, `integration_tests/` | demo/bench/test | not production surface | **Out** *(see §3)* | + +## §3 Out of scope (explicit non-goals) + +- **The integrity / authenticity / confidentiality of the serialized bytes.** Fory deserializes what it is given; it does not authenticate, MAC, or encrypt payloads. If bytes can be tampered with in transit/at rest, that is the application's problem to solve (sign/encrypt before handing to Fory). *(inferred)* +- **Anything when the caller disables class registration on an untrusted payload source.** `requireClassRegistration(false)` is a documented, deliberately-available footgun; using it against attacker-controlled bytes is out of the model's protection (see §5a/§9). *(documented — config: "Disabling may allow unknown classes to be deserialized, potentially causing security risks")* Review Comment: Disabling class registration should take arbitrary class/gadget materialization out of the model, but not every security outcome. The current deserialization model still treats runtime safety, disproportionate allocation, cleanup/retained state, and explicit policy bypasses as security boundaries for untrusted bytes. A crash, OOM, retained-state bug, or native memory-safety issue should not become out-of-model only because the reproducer uses registration disabled. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
