belugabehr commented on PR #3194: URL: https://github.com/apache/avro/pull/3194#issuecomment-2629242178
I am taking another pass at this one. The performance improvements are just too significant to pass up (11%+ in my local benchmarhks). Here are my benchmarks: ``` # JMH version: 1.35 # VM version: JDK 17.0.13, OpenJDK 64-Bit Server VM, 17.0.13+11-Ubuntu-2ubuntu124.04 # VM invoker: /usr/lib/jvm/java-17-openjdk-amd64/bin/java # This is from AVRO-4069 (the worst result I measured for this branch) Result "com.szymon_kaluza.protobuf_avro.benchmark.AvroBench.serializeAndDeserializeSmall": 889717.049 ±(99.9%) 9103.527 ops/s [Average] (min, avg, max) = (860194.914, 889717.049, 909429.461), stdev = 13625.731 CI (99.9%): [880613.522, 898820.576] (assumes normal distribution) # This is from master (the best result I measured for this branch) Result "com.szymon_kaluza.protobuf_avro.benchmark.AvroBench.serializeAndDeserializeSmall": 796084.662 ±(99.9%) 12190.142 ops/s [Average] (min, avg, max) = (764448.008, 796084.662, 850984.160), stdev = 18245.632 CI (99.9%): [783894.521, 808274.804] (assumes normal distribution) ``` The reason I even investigated this is because Visual VM profiling was lighting up on `IdentityHashMap` usage. By migrating to HashMap, I was able to achieve most of the performance gains. Also, another thing that I hadn't considered before is that `IdentityHashMap` uses object reference-equality for insertions and lookups. This means that the Map can look different for each instantiation of Avro because obviously the reference values will be different every time it runs. This made benchmarking kind of tricky and unreliable as well. If the keys just happened to have a lot of collisions, then performance would suffer, alternatively, benchmarking improved if the keys happen to be laid out more evenly in the bucket space. Yet another reason to simply get off this implementation. There is one caveat in the JavaDocs: > For many JRE implementations and operation mixes, this class will yield better performance than [HashMap](https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html) (which uses chaining rather than linear-probing). * https://docs.oracle.com/javase/8/docs/api/java/util/IdentityHashMap.html I get around this by exploiting the fact that the number of items in the map is fixed as instantiation. Therefore, I use a Map size sufficiently large such that there are no key collisions and therefore no chaining involved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@avro.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org