Hi I am trying to use a plain Java consumer (over SSL) to consume a very large amount of historic data (20+TB across 20+ partitions). Consumption performance is very low when fully parallelized.
We are seeing about* 200k rec/s* with java consumer versus *950k rec/s* with librdkafka We are seeing about *1 gigabit/s* with java consumer versus *5.3 gigabit/s* with librdkafka Both applications are doing no-ops (eg: consume, deserialize as byte arrays, print a line for every 100 events). Both applications are using defaults (including the same fetch sizes, maximums, batch sizes, etc). The java processes do not appear to be starved for resources, CPU, memory, etc, nor do the kafkacat instances. Everything is being run in exactly the same environments with the same resources, but the Java Kafka client is just incredibly slow. Java Kafka Client version 2.4.x JDK 11 (I think there was an SSL performance issue that required upgrading to at least JDK 11). Am I doing wrong here? The last time I tested the performance difference between these two libraries was years ago, and it was something like librdkafka was a bit faster in most cases, but certainly not 5x faster in a no-op scenario. Is this in line with expectations? Any thoughts or suggestions would be very much appreciated. Thanks Adam