weiqingy commented on issue #845: URL: https://github.com/apache/flink-agents/issues/845#issuecomment-4700838227
## Verification outcome: `bytes` is checkpoint-safe → admitting it Verified the full question (does Python `bytes` survive the Pemja → Flink state path as a native, checkpoint-stable JVM type?) two independent ways, both agreeing: **1. Pemja source (the conversion logic).** In Pemja 0.5.5 (`src/main/c/pemja/core/pyutils.c`), the Python→Java dispatch `JcpPyObject_AsJObject` routes `PyBytes_CheckExact` to `JcpPyBytes_AsJObject`, whose body is `NewByteArray` + `SetByteArrayRegion` — a genuine JVM `byte[]` with no native back-pointer. `bytearray` has no branch and falls through to the generic `JcpPyObject_AsJPyObject` wrapper (`Py_INCREF` + a process-local pointer) — the unsafe case that crashes on restore. The conversion is byte-for-byte identical in 0.5.5 and 0.5.7. **2. Runtime probe (the actual materialization).** A throwaway Java test driving a real embedded Pemja interpreter materialized the values and inspected their Java types: - `b"hello"` → Java `[B` (`byte[]`, len 5) — safe - `bytearray(...)` → `pemja.core.object.PyObject` — unsafe wrapper - `str` → `java.lang.String` — known-good baseline This is the `Python-object → Java-object` conversion that `FlinkMemoryObject.set()`'s `j_memory_object.set(path, value)` bridge invokes, so it reflects the real `memory.set` path. **Why that settles restore safety.** `byte[]` is a first-class Flink-serializable primitive array, so once `bytes` materializes as `byte[]` it joins the already-proven `byte[]` checkpoint path. A literal checkpoint-restart round-trip still can't run on the MiniCluster (in-place recovery doesn't recreate the JVM, so the Pemja conversion path isn't crossed) — a permanent end-to-end assertion is deferred to the recovery harness in #836 (noted there). **Key narrowing — exact type only.** The safe Pemja branch is gated on `PyBytes_CheckExact`, so only exact `bytes` is safe; `bytearray` and `bytes` subclasses wrap as `PyObject`. This lines up exactly with the validator's existing exact-type check (`type(value) in _CHECKPOINT_STABLE_SCALARS`), so admitting `bytes` is a one-line addition: exact `bytes` is accepted, `bytearray` and `bytes` subclasses stay rejected for free. PR: #846 (validator + accept/reject tests pinning the exact-type boundary + contract docs). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
