weiqingy opened a new pull request, #846: URL: https://github.com/apache/flink-agents/pull/846
Linked issue: #845 ### Purpose of change Admits exact Python `bytes` into the `set()`-time memory value validator (`validate_memory_value`), which previously rejected it. `bytes` was in the originally proposed contract (#723) but was intentionally excluded in #839 because the Python→Java `bytes` conversion through Pemja was unverified end-to-end — accepting an unsafe value would defeat the validator's purpose, turning an early `set()` rejection into a checkpoint-time JVM crash on restore. This change closes that gap. Verification shows exact Python `bytes` materializes through Pemja into a native Java `byte[]`, which is checkpoint-stable: - Pemja 0.5.5 C source (`src/main/c/pemja/core/pyutils.c`): the Python→Java dispatch `JcpPyObject_AsJObject` routes `PyBytes_CheckExact` to `JcpPyBytes_AsJObject`, whose body is `NewByteArray` + `SetByteArrayRegion` — a genuine JVM `byte[]` with no native back-pointer. `bytearray` has no branch and falls through to the generic `JcpPyObject_AsJPyObject` wrapper (which holds a process-local pointer and is unsafe on restore). The conversion is identical in 0.5.5 and 0.5.7. - A local embedded-Pemja probe driving a real interpreter confirmed it at runtime: `b"hello"` materializes on the Java side as `byte[]` (`[B`), while `bytearray` materializes as `pemja.core.object.PyObject` and `str` as `java.lang.String`. Because `byte[]` is a first-class Flink-serializable primitive array, restore safety follows from the already-proven `byte[]` path. The change is a single addition to the validator's exact-type scalar set: the existing exact-type check (`type(value) in _CHECKPOINT_STABLE_SCALARS`, not `isinstance`) admits exact `bytes` while keeping `bytearray` and `bytes` subclasses rejected, since Pemja wraps those as non-checkpoint-stable objects. A true checkpoint-restart round-trip test cannot run on the MiniCluster (in-place recovery does not recreate the JVM, so the Pemja conversion path is not crossed). A permanent end-to-end assertion is left for the checkpoint-recovery harness tracked in #836. ### Tests `python/flink_agents/runtime/tests/test_memory_value_validation.py`: - Accept: exact `bytes` as a scalar (`b""`, `b"hello"`) and nested in `list`/`dict`. - Reject: a new `test_rejects_bytearray_and_bytes_subclass` (using a real `bytes` subclass and `bytearray`) that pins the exact-type boundary — an `isinstance`-based regression would wrongly accept the subclass and fail this test. Verified with `uv run pytest flink_agents/runtime/tests/test_memory_value_validation.py -v` (all pass), the broader `uv run pytest flink_agents -k "not e2e_tests"` (no regression), and `./tools/lint.sh -c` (0 violations). ### API No new or changed public API signatures. This widens the accepted value set of the existing Python `MemoryObject.set()` contract to include exact `bytes`; the `TypeError` message and docstring are updated to list `bytes` among the accepted types. ### Documentation - [ ] `doc-needed` - [ ] `doc-not-needed` - [x] `doc-included` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
