Jubin Soni created FLINK-39951:
----------------------------------
Summary: [Python] ArrayConstructor uses == for String comparison,
silently truncating long array values to int
Key: FLINK-39951
URL: https://issues.apache.org/jira/browse/FLINK-39951
Project: Flink
Issue Type: Bug
Components: API / Python
Reporter: Jubin Soni
*Summary:*
ArrayConstructor uses String reference equality ({{{}=={}}}) instead of value
equality ({{{}.equals(){}}}) for Python {{'l'}} typecode, causing incorrect
deserialization of long arrays.
*Description:*
In {{{}ArrayConstructor.java{}}}, the typecode check uses reference equality
({{{}=={}}}) rather than value equality ({{{}.equals(){}}}) when checking for
Python's {{'l'}} (long) typecode:
{{if (args.length == 2 && args[0] == "l") {}}
*File:*
[https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/api/common/python/pickle/ArrayConstructor.java]
Line: 30
Because {{args[0]}} is a deserialized {{String}} object at runtime, it is not
guaranteed to be the same interned instance as the string literal {{{}"l"{}}}.
As a result, the comparison evaluates to {{{}false{}}}, making the {{long[]}}
handling path effectively unreachable.
Consequently, arrays with typecode {{'l'}} fall through to
{{{}super.construct(){}}}, resulting in incorrect deserialization behavior.
*Steps to Reproduce:*
# Create a Python array with typecode {{'l'}} containing values larger than
{{Integer.MAX_VALUE}} (for example, {{{}3000000000{}}}).
# Pass the array through Flink's Python-to-Java serialization/deserialization
path.
# Read the resulting values on the Java side.
*Expected Result:*
Values are preserved as 64-bit longs and deserialized correctly.
*Actual Result:*
The {{'l'}} typecode branch is never taken, and values may be incorrectly
handled, potentially resulting in truncation or corruption when large values
are processed.
*Impact:*
This can lead to silent data corruption for Python arrays containing 64-bit
integer values. Users may receive incorrect results without any exception or
warning, particularly when values exceed the 32-bit integer range.
*Proposed Fix:*
Replace:
{{if (args.length == 2 && args[0] == "l") {}}
with:
{{if (args.length == 2 && "l".equals(args[0])) {}}
This correctly performs value-based string comparison and ensures the intended
{{long[]}} deserialization path is executed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)