LouisLou2 opened a new issue, #2307: URL: https://github.com/apache/fory/issues/2307
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/fory/issues) and found no similar issues. ### Version latest commit (8d028b47d900a31d86c7b67c37f952ac36b30d6a) ### Component(s) Java ### Minimal reproduce step * The following Java code demonstrates the issue: ```java package org.apache.fory; import org.apache.fory.config.Language; public class Main { private static class SomeClass1 { int number; // Adding a field to make it slightly different in structure, though name is the primary issue String name = "SomeClass1"; } private static class SomeClass2 { int number; String name = "SomeClass2"; } public static void main(String[] args) { SomeClass1 object1 = new SomeClass1(); object1.number = 10; SomeClass2 object2 = new SomeClass2(); object2.number = 20; Fory fory = Fory.builder().withLanguage(Language.XLANG) .build(); // Register two classes with names that only differ by case. // XLANG might handle these in a way that leads to a hash collision // for the MetaStringBytes cache key if encoding is not considered. fory.register(SomeClass1.class, "aclass"); fory.register(SomeClass2.class, "Aclass"); // Note the capital 'A' System.out.println("Serializing object1 (SomeClass1 as \"aclass\")"); byte[] bytes1 = fory.serialize(object1); System.out.println("Deserializing object1..."); SomeClass1 deserializedObject1 = (SomeClass1) fory.deserialize(bytes1); System.out.println("Deserialized object1.number: " + deserializedObject1.number); System.out.println("Serializing object2 (SomeClass2 as \"Aclass\")"); byte[] bytes2 = fory.serialize(object2); System.out.println("Deserializing object2..."); // This is where the ClassCastException occurs SomeClass2 deserializedObject2 = (SomeClass2) fory.deserialize(bytes2); System.out.println("Deserialized object2.number: " + deserializedObject2.number); } } ``` ### What did you expect to see? I expected the program to run without exceptions. `object1` should be successfully serialized and deserialized back into a `SomeClass1` instance. `object2` should be successfully serialized and deserialized back into a `SomeClass2` instance. The output should show the correct numbers for both deserialized objects. ### What did you see instead? The program throws a `java.lang.ClassCastException` when attempting to deserialize `object2`: ``` INFO Fory:165 [main] - Created new fory org.apache.fory.Fory@xxxxxxxx Serializing object1 (SomeClass1 as "aclass") Deserializing object1... Deserialized object1.number: 10 Serializing object2 (SomeClass2 as "Aclass") Deserializing object2... Exception in thread "main" java.lang.ClassCastException: org.apache.fory.Main$SomeClass1 cannot be cast to org.apache.fory.Main$SomeClass2 at org.apache.fory.Main.main(Main.java:31) ``` (Note: Line number in stack trace might vary slightly based on exact code formatting/additions). This indicates that `fory.deserialize(bytes2)` returned an instance of `SomeClass1` when it was expected to return `SomeClass2`. ### Anything Else? The root cause appears to be related to how `MetaStringBytes` are cached and retrieved in the `org.apache.fory.io.FuryStreamReader#readSmallMetaStringBytes` method (or a similar method responsible for reading meta strings). The suspected code snippet is: ```java private MetaStringBytes readSmallMetaStringBytes(MemoryBuffer buffer, int len) { byte encoding = buffer.readByte(); if (len == 0) { assert encoding == MetaString.Encoding.UTF_8.getValue(); return MetaStringBytes.EMPTY; } long v1, v2 = 0; if (len <= 8) { v1 = buffer.readBytesAsInt64(len); } else { v1 = buffer.readInt64(); v2 = buffer.readBytesAsInt64(len - 8); } // The key for the map is (v1, v2) MetaStringBytes byteString = longLongMap.get(v1, v2); if (byteString == null) { byteString = createSmallMetaStringBytes(len, encoding, v1, v2); } return byteString; } ``` The `longLongMap` uses `(v1, v2)` as the cache key. These `long` values are derived from the byte representation of the string. However, the `encoding` byte itself is not part of this cache key. If two different strings (e.g., class names "aclass" and "Aclass" when `Language.XLANG` is used, which might have specific case handling or normalization for registered names) happen to produce the same `(v1, v2)` hash from their byte content *but have different `encoding` values, or represent different semantic strings despite sharing a hash*, the cache can return an incorrect `MetaStringBytes` object. In the provided example: 1. `SomeClass1` is registered as "aclass". When serialized, "aclass" (and its encoding) is written and potentially cached in `longLongMap` with a key `(v1, v2)`. 2. `SomeClass2` is registered as "Aclass". When serialized, "Aclass" is written. If its byte representation (possibly after XLANG processing) leads to the *same* `(v1, v2)` key as "aclass", `longLongMap.get(v1, v2)` might return the cached `MetaStringBytes` for "aclass" instead of creating/returning one for "Aclass". 3. During deserialization of `object2` (which should be `SomeClass2` linked to "Aclass"), if the system retrieves the `MetaStringBytes` for "aclass" due to this cache collision, it will then attempt to instantiate `SomeClass1`, leading to the `ClassCastException`. The `encoding` byte (and potentially the actual string content itself if `v1,v2` is just a hash) should be considered part of the uniqueness for `MetaStringBytes` to prevent such collisions. The current `LongLongMap` keying strategy seems insufficient if `v1` and `v2` alone can collide for semantically different strings (especially when considering different encodings or case variations managed by XLANG). ### Are you willing to submit a PR? - [x] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
