[ 
https://issues.apache.org/jira/browse/FLINK-39951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-39951:
-----------------------------------
    Labels: pull-request-available  (was: )

> [Python] ArrayConstructor uses == for String comparison, silently truncating 
> long array values to int
> -----------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-39951
>                 URL: https://issues.apache.org/jira/browse/FLINK-39951
>             Project: Flink
>          Issue Type: Bug
>          Components: API / Python
>            Reporter: Jubin Soni
>            Priority: Major
>              Labels: pull-request-available
>
> *Summary:*
> ArrayConstructor uses String reference equality ({{{}=={}}}) instead of value 
> equality ({{{}.equals(){}}}) for Python {{'l'}} typecode, causing incorrect 
> deserialization of long arrays.
> *Description:*
> In {{{}ArrayConstructor.java{}}}, the typecode check uses reference equality 
> ({{{}=={}}}) rather than value equality ({{{}.equals(){}}}) when checking for 
> Python's {{'l'}} (long) typecode:
>  
> {{if (args.length == 2 && args[0] == "l") {}}
> *File:*
> [https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/api/common/python/pickle/ArrayConstructor.java]
> Line: 30
> Because {{args[0]}} is a deserialized {{String}} object at runtime, it is not 
> guaranteed to be the same interned instance as the string literal 
> {{{}"l"{}}}. As a result, the comparison evaluates to {{{}false{}}}, making 
> the {{long[]}} handling path effectively unreachable.
> Consequently, arrays with typecode {{'l'}} fall through to 
> {{{}super.construct(){}}}, resulting in incorrect deserialization behavior.
> *Steps to Reproduce:*
>  # Create a Python array with typecode {{'l'}} containing values larger than 
> {{Integer.MAX_VALUE}} (for example, {{{}3000000000{}}}).
>  # Pass the array through Flink's Python-to-Java 
> serialization/deserialization path.
>  # Read the resulting values on the Java side.
> *Expected Result:*
> Values are preserved as 64-bit longs and deserialized correctly.
> *Actual Result:*
> The {{'l'}} typecode branch is never taken, and values may be incorrectly 
> handled, potentially resulting in truncation or corruption when large values 
> are processed.
> *Impact:*
> This can lead to silent data corruption for Python arrays containing 64-bit 
> integer values. Users may receive incorrect results without any exception or 
> warning, particularly when values exceed the 32-bit integer range.
> *Proposed Fix:*
> Replace:
>  
> {{if (args.length == 2 && args[0] == "l") {}}
> with:
>  
> {{if (args.length == 2 && "l".equals(args[0])) {}}
> This correctly performs value-based string comparison and ensures the 
> intended {{long[]}} deserialization path is executed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to