Hello, I'm facing a somewhat similar issue than Antonin described in https://mail.mozilla.org/pipermail/heka/2015-April/000522.html where Heka is slow getting stuff from Kafka (around 1,600 messages/s).
I have setup a Kafka instance where I dump a truck load of log messages. Without tuning anything, Heka is able to read from files and to output roughly 100,000 lines/s of logs to Kafka using the Kafka output plugin. I have 3 distinct servers (all VMs): - 1 heka used to feed Kafka (producer): this one does at least 100,000 messages/s with a basic config [1], running heka 0.9.2 - 1 kafka (standard config), running kafka 0.8.2.1 - 1 heka used to parse data coming from kafka (consumer), running heka 0.9.2 This looks like a convoluted setup but it is required by the way I receive the logs (meaning I couldn't really merge the 2 heka instances, specially if I want to add more consumers). If I setup my heka consumer (the last server in the list above), as a simple heka input without output or decoder or anything (cf config in [2]), I only get around 1,600 messages/s from Kafka. I know Kafka can do better because locally (on the kafka server) I can easily get around 354,000 messages/s (using kafka-console-consumer.sh). And remotely (ie on the consumer machine), using kafka-console-consumer.sh, I get around 340,000 messages/s. I've played with maxprocs, plugin_chansize, max_process_inject, poolsize, default_fetch_size, event_buffer_size, max_open_reqests, background_refresh_frequency, max_wait_time and I haven't been able to do much better. The only way I've managed to get more messages/s is to duplicate my input (and to run more heka servers): ie to declare a KafkaInput2 block similar to KafkaInput1. Ideally I'd like to sustain at least around 80,000 messages/s or at least be CPU bound by my decoders, right now the bottleneck is the input. Is this something anyone has experienced? Is there something wrong in my setup? Wait, I just discovered something, if I switch the offset_method to oldest, I get around 5,000 messages/s instead of 1,600? Cheers, Mathieu ============================================================ [1] Producer config [hekad] maxprocs = 4 base_dir = "/data/heka-0_9_2-linux-amd64" share_dir = "/data/heka-0_9_2-linux-amd64/share/heka" poolsize = 300 [syslog] type = "LogstreamerInput" log_directory = "/data/syslog/log" oldest_duration = "1h" file_match = 'heka_(?P<Year>\d+)-(?P<Month>\d+)-(?P<Day>\d+)-(?P<Hour>\d+)-(?P<Minute>\d+).log' priority = ["Year", "Month", "Day", "Hour", "Minute"] [FxaKafkaOutput] type = "KafkaOutput" message_matcher = "TRUE" topic = "logs" addrs = ["10.100.100.37:9092"] encoder = "ProtobufEncoder" partitioner = "RoundRobin" [DashboardOutput] ticker_interval = 5 ============================================================ [2] Consumer config [hekad] maxprocs = 4 base_dir = "/home/heka/heka" share_dir = "/home/heka/heka/share/heka" plugin_chansize = 300 max_process_inject = 40 poolsize = 2000 [KafkaInput1] type = "KafkaInput" topic = "logs" addrs = ["10.100.100.37:9092"] splitter = "KafkaSplitter" decoder = "ProtobufDecoder" [KafkaSplitter] type = "NullSplitter" use_message_bytes = true [DashboardOutput] ticker_interval = 5 -- Mathieu Chouquet-Stringer [email protected] The sun itself sees not till heaven clears. -- William Shakespeare -- _______________________________________________ Heka mailing list [email protected] https://mail.mozilla.org/listinfo/heka

