jayhan94 commented on PR #2057: URL: https://github.com/apache/fury/pull/2057#issuecomment-2663402477
If the key-value pairs in the map are small (just a few bytes), the serialization overhead is mostly concentrated on the deserialization of some integers. I suspect these numbers represent metadata such as length, headers, etc. Their presence weakens the virtual function call optimization for the string serializer, and instead, the performance decreases due to branch prediction failures and type comparisons. <img width="1103" alt="image" src="https://github.com/user-attachments/assets/193f4235-0952-4b92-8e43-c6bc38a67790" /> I tried increasing the length of the key-value pairs to 30 and 100 bytes (stringRatio=0.4), and the performance regression improved. However, the expected optimization didn’t occur. Results of large kv ``` Before: Benchmark (mapSize) (stringRatio) Mode Cnt Score Error Units StringMapSerializationSuite.deserialize 50 0.4 thrpt 5 539543.330 ± 7888.817 ops/s After: Benchmark (mapSize) (stringRatio) Mode Cnt Score Error Units StringMapSerializationSuite.deserialize 50 0.4 thrpt 5 513941.928 ± 21391.914 ops/s ``` @chaokunyang @pandalee99 Thanks for your review and patience. If you don't have different opinions, I’ll be closing this PR as it’s a failed change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
