jubins opened a new pull request, #28473: URL: https://github.com/apache/flink/pull/28473
## What is the purpose of the change Fixes [FLINK-39951](https://issues.apache.org/jira/browse/FLINK-39951) — `ArrayConstructor.construct()` uses reference equality (`==`) to compare the pickle typecode string `"l"` instead of value equality (`.equals()`). Since `args[0]` is a deserialized `String` object at runtime and not an interned literal, the `==` comparison always returns `false`, making the `long[]` code path effectively dead code. As a result, any Python array with typecode `'l'` (64-bit longs) silently falls through to `super.construct()`, which treats the values as `int[]` and truncates them to 32 bits. Users passing long values larger than `Integer.MAX_VALUE` receive silently corrupted data with no error or warning. The fix replaces `args[0] == "l"` with `"l".equals(args[0])` on line 30 of `ArrayConstructor.java`. ## Brief change log - Fixed `ArrayConstructor.construct()` to use `"l".equals(args[0])` instead of `args[0] == "l"` for typecode comparison - Added `ArrayConstructorTest` with four test cases covering: large long values above `Integer.MAX_VALUE`, single value arrays, empty arrays, and delegation to the parent class for non-`'l'` typecodes ## Verifying this change This change is covered by new unit tests in `ArrayConstructorTest`: - `testLongArrayPreservesValuesAboveIntMax()` — the core regression test; uses `new String("l")` to guarantee a non-interned reference (proving the old `==` would have failed), then asserts that values like `3_000_000_000L` and `Long.MAX_VALUE` are preserved correctly as `long[]` - `testLongArrayWithSingleValue()` — verifies a single large long value is handled correctly - `testLongArrayWithEmptyList()` — verifies empty input produces an empty `long[]` - `testNonLongTypecodesDelegateToSuper()` — confirms other typecodes (e.g. `'i'`) still fall through to the parent `super.construct()` correctly ## Does this pull request potentially affect one of the following parts - **Dependencies** (does it add or upgrade a dependency): no - **The public API**, i.e., is any changed class annotated with `@Public(Evolving)`: no - **The serializers**: no - **The runtime per-record code paths** (performance sensitive): no - **Anything that affects deployment or recovery** (JobManager, Checkpointing, Kubernetes/Yarn, ZooKeeper): no - **The S3 file system connector**: no ## Documentation - Does this pull request introduce a new feature? no — it fixes a bug in existing functionality - If yes, how is the feature documented? not applicable ## Was generative AI tooling used to co-author this PR? - [x] Yes — Claude Code was used as a pair-programming assistant. All code was written, understood, and verified by the author. Generated-by: Claude Opus 4.8 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
