weiqingy opened a new pull request, #846:
URL: https://github.com/apache/flink-agents/pull/846

   Linked issue: #845
   
   ### Purpose of change
   
   Admits exact Python `bytes` into the `set()`-time memory value validator 
(`validate_memory_value`), which previously rejected it.
   
   `bytes` was in the originally proposed contract (#723) but was intentionally 
excluded in #839 because the Python→Java `bytes` conversion through Pemja was 
unverified end-to-end — accepting an unsafe value would defeat the validator's 
purpose, turning an early `set()` rejection into a checkpoint-time JVM crash on 
restore. This change closes that gap.
   
   Verification shows exact Python `bytes` materializes through Pemja into a 
native Java `byte[]`, which is checkpoint-stable:
   
   - Pemja 0.5.5 C source (`src/main/c/pemja/core/pyutils.c`): the Python→Java 
dispatch `JcpPyObject_AsJObject` routes `PyBytes_CheckExact` to 
`JcpPyBytes_AsJObject`, whose body is `NewByteArray` + `SetByteArrayRegion` — a 
genuine JVM `byte[]` with no native back-pointer. `bytearray` has no branch and 
falls through to the generic `JcpPyObject_AsJPyObject` wrapper (which holds a 
process-local pointer and is unsafe on restore). The conversion is identical in 
0.5.5 and 0.5.7.
   - A local embedded-Pemja probe driving a real interpreter confirmed it at 
runtime: `b"hello"` materializes on the Java side as `byte[]` (`[B`), while 
`bytearray` materializes as `pemja.core.object.PyObject` and `str` as 
`java.lang.String`.
   
   Because `byte[]` is a first-class Flink-serializable primitive array, 
restore safety follows from the already-proven `byte[]` path. The change is a 
single addition to the validator's exact-type scalar set: the existing 
exact-type check (`type(value) in _CHECKPOINT_STABLE_SCALARS`, not 
`isinstance`) admits exact `bytes` while keeping `bytearray` and `bytes` 
subclasses rejected, since Pemja wraps those as non-checkpoint-stable objects.
   
   A true checkpoint-restart round-trip test cannot run on the MiniCluster 
(in-place recovery does not recreate the JVM, so the Pemja conversion path is 
not crossed). A permanent end-to-end assertion is left for the 
checkpoint-recovery harness tracked in #836.
   
   ### Tests
   
   `python/flink_agents/runtime/tests/test_memory_value_validation.py`:
   
   - Accept: exact `bytes` as a scalar (`b""`, `b"hello"`) and nested in 
`list`/`dict`.
   - Reject: a new `test_rejects_bytearray_and_bytes_subclass` (using a real 
`bytes` subclass and `bytearray`) that pins the exact-type boundary — an 
`isinstance`-based regression would wrongly accept the subclass and fail this 
test.
   
   Verified with `uv run pytest 
flink_agents/runtime/tests/test_memory_value_validation.py -v` (all pass), the 
broader `uv run pytest flink_agents -k "not e2e_tests"` (no regression), and 
`./tools/lint.sh -c` (0 violations).
   
   ### API
   
   No new or changed public API signatures. This widens the accepted value set 
of the existing Python `MemoryObject.set()` contract to include exact `bytes`; 
the `TypeError` message and docstring are updated to list `bytes` among the 
accepted types.
   
   ### Documentation
   
   - [ ] `doc-needed`
   - [ ] `doc-not-needed`
   - [x] `doc-included`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to