lzaeh opened a new pull request, #2703:
URL: https://github.com/apache/fory/pull/2703

   <!--
   **Thanks for contributing to Apache Fory™.**
   
   **If this is your first time opening a PR on fory, you can refer to 
[CONTRIBUTING.md](https://github.com/apache/fory/blob/main/CONTRIBUTING.md).**
   
   Contribution Checklist
   
       - The **Apache Fory™** community has requirements on the naming of pr 
titles. You can also find instructions in 
[CONTRIBUTING.md](https://github.com/apache/fory/blob/main/CONTRIBUTING.md).
   
       - Apache Fory™ has a strong focus on performance. If the PR you submit 
will have an impact on performance, please benchmark it first and provide the 
benchmark result here.
   -->
   
   ## Why?
   In Go, the list serializer previously lacked certain fast-path matches and 
didn’t handle some nested cases well during deserialization; this PR fixes part 
of those issues.
   
   <!-- Describe the purpose of this PR. -->
   
   ## What does this PR do?
   ### 1. It enhances the slice serializer in Go, mainly by:
   
   - aligning the protocol handling steps;
   
   - adding the fast path for homogeneous cases;
   
   - restoring element types in []interface{} as accurately as possible during 
deserialization. Specifically:
   
        - For struct, array, and slice: []T, we can faithfully recover the 
inner element types on receipt;
   
        - For map contained in []interface{}, we can only receive it as 
map[interface{}]interface{} for now; I haven’t found a better approach yet.
   
   ### 2. Primary function/file changes:
   
   - To accommodate scenarios where needWriteRef requires a “pending” state 
rather than a strict boolean, I followed Python and changed it to int8, using 1 
for need, 0 for not-need, and -1 for pending.
   
   - Added a type parameter to getTypeInfo so that we can query typeinfo in 
situations where only a type is available; forcing the target to carry a value 
would be unnecessarily bloated in these cases.
   
   - I moved the array serializer into array.go. I’m open to feedback on 
whether this is appropriate. The array serializer changes focus on:
    a. enabling []T to receive [n]T;
    b. due to Go’s type registration constraints, resource metadata appears to 
be registered via the slice type. Therefore, when serializing arrays we convert 
them to the corresponding slice type to fetch resources; during 
deserialization, read reconstructs an array as the receiver.
   
   ### 3. Known gaps / not yet addressed:
   
   - Zero-copy and out-of-band memory tests are failing; I’ll continue studying 
and iterating on them.
   
   - From the docs, multi-dimensional arrays of primitive types, like 
[x][y][z]primitiveTypes, are categorized as tensors. The current Go side treats 
them as slices, which is out of spec and needs attention.
   
   - Some fast paths in the list serializer (Go’s slice serializer), especially 
those related to int, are not fully understood. I only read the Python code and 
implemented parts based on assumptions, which is not rigorous and is 
problematic. I will read the Java code and documentation to clarify.
   
   - I didn’t handle issue #2639. The id removal there is tied to the 
registration refactor. I was informed that a teammate has already completed 
that refactor, so I left it untouched for now.
   
   
   ### 4. Newly discovered issue (flag semantics diverge between Java and 
Python for the list serializer):
   
   - In Python, the flags are: COLLECTION_NOT_DECL_ELEMENT_TYPE = 0b100 and 
COLLECTION_NOT_SAME_TYPE = 0b1000
   In Java, they are: IS_DECL_ELEMENT_TYPE = 0b100 and IS_SAME_TYPE = 0b1000
   
   - The bit values are identical, but the semantics are inverted. Initially I 
suspected the code matched them inversely, until I created a List<String> in 
Java, serialized it, copied the bytes verbatim into Python, and attempted 
cross-language deserialization. It failed for the following reasons:
   
   - Under Java semantics, serializing List<String> is considered “declared and 
same type.” In my test the final list flag was 8; the deserialization debug 
trace also showed it being treated as declared and same type.
   <img width="871" height="528" alt="image" 
src="https://github.com/user-attachments/assets/2b29c6a9-7e88-4829-a34a-9f7ebdd57d4b";
 />
   
   
   - In Python, it reads the same flag = 8, but due to different semantics and 
flow, Python interprets it as “non-declarative.” Because it is treated as 
non-declarative, Python proceeds to read the element type ID from the byte 
stream; however,  declarative + homogeneous path does not write element type 
IDs, which causes byte misalignment.
   <img width="879" height="132" alt="image" 
src="https://github.com/user-attachments/assets/464fe426-fd2a-49b9-b35c-d5acc7e8a614";
 />
   
   The aforementioned naive test is as follows:
   <img width="591" height="371" alt="image" 
src="https://github.com/user-attachments/assets/4bae07a9-1555-48c9-a1bd-acc5e2845cd5";
 />
   <img width="846" height="151" alt="image" 
src="https://github.com/user-attachments/assets/ee8c707b-ac55-4d70-90be-d047b1f19be7";
 />
   
   
   
   I’m wondering whether there’s a reason I’ve overlooked, whether my usage is 
incorrect, and how this should be handled properly.
   
   ### 5. Questions and concerns:
   
   - To pass cross-language tests, I currently aligned with Python semantics. 
But small Java-side tests surfaced the issue above. This looks like a semantic 
mismatch in flags, yet it effectively means that when serializing the same 
list<T>, one side treats it as declarative while the other as non-declarative. 
Because of this, I temporarily skipped one batch test for slices locally. 
Although I aligned with Python semantics in implementation, my personal view 
is: for one-dimensional slices whose element type T is not map, they can 
generally be treated as declarative constructions, and the receiver should 
restore elements with accurate types where possible. I’m not sure if this is 
correct.
   
   - If that understanding is incorrect or incomplete, I’m curious about the 
exact conditions under which a “declarative construction” of list is triggered 
(I may have missed details in the documentation).
   
   - map is particularly tricky. Both slice and map have unique IDs but may 
vary in concrete types in Go. When []interface{} contains such values, 
deserialization becomes challenging. Given the slice design, I added some 
admittedly inelegant code (with ample room for refactoring) to accurately 
receive various []T where T ≠ map inside []interface{}. However, if 
[]interface{} contains a map, I still haven’t found a way to place it into a 
precisely-typed container; the fallback is to use map[interface{}]interface{} 
and rely on user-side handling. I’m not sure whether this approach is 
acceptable.
   
   I’d like this to be a draft PR; I’ll work on fixing the remaining two or 
three tests and continue exploring how to refactor the code introduced in this 
PR.
   
   
   <!-- Describe the details of this PR. -->
   
   ## Related issues
   
   - #2633 
   
   <!--
   Is there any related issue? If this PR closes them you say say fix/closes:
   
   - #xxxx0
   - #xxxx1
   - Fixes #xxxx2
   -->
   
   ## Does this PR introduce any user-facing change?
   
   <!--
   If any user-facing interface changes, please [open an 
issue](https://github.com/apache/fory/issues/new/choose) describing the need to 
do so and update the document if necessary.
   
   Delete section if not applicable.
   -->
   
   - [ ] Does this PR introduce any public API change?
   - [ ] Does this PR introduce any binary protocol compatibility change?
   
   ## Benchmark
   
   <!--
   When the PR has an impact on performance (if you don't know whether the PR 
will have an impact on performance, you can submit the PR first, and if it will 
have impact on performance, the code reviewer will explain it), be sure to 
attach a benchmark data here.
   
   Delete section if not applicable.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to