On 09/22/2015 11:06 PM, Andre wrote:
Rob,
Thanks for the quick reply.
Just tested with PayloadEncoder but got the similar results.
I find it hard to believe that switching from ESJsonEncoder to PayloadEncoder
didn't change your throughput. ESJsonEncoder is much slower than
PayloadEncoder. This is usually okay, however, b/c ESJsonEncoder is usually
used when sending data to ElasticSearch, and ElasticSearch itself is usually
much more of a bottleneck than Heka is.
But after reading your post I realised that:
1) My correct pipeline is:
TcpInput -> TokenSplitter -> PayloadRegexDecoder -> ESJsonEncoder ->
FileOutput
I don't understand why you'd want to use ESJsonEncoder with a FileOutput.
What's going on there?
Also, PayloadRegexDecoder is known to be slow and unwieldy. If you're doing any
parsing at all, you can get much better performance with a SandboxDecoder that
uses an LPEG grammar than you can with a PayloadRegexDecoder.
2) I should give it a go and remove TokenSplitter from the pipeline
Why would you remove TokenSplitter? If you want to split the records on
newlines, then TokenSplitter is what you want.
Indeed my EPS jumped to 21.6K (however I could no longer have the
messages flowing through the pipeline as Heka no longer recognised them
after their input. :D )
I don't understand the context enough to know what you mean by this.
So on I guess the bottleneck is either in the Splitter or the Regex
decoder (though the regex is a simple '^(?P<Payload>.*)' )
PayloadRegexDecoder is generally pretty slow. And what's the point of using a
PayloadRegexDecoder when you're not even doing any decoding? If you want to
pass the payload through without any parsing, then don't use a decoder at all.
So clearly I don't understand a lot of what you're describing here. If you want
meaningful help, you'll probably want to include the actual TOML configuration
that you're using, so we can see the full context of your setup. Also useful
would be describing what it is you're trying to accomplish, exactly.
-r
Cheers
On Wed, Sep 23, 2015 at 2:42 PM, Rob Miller <[email protected]
<mailto:[email protected]>> wrote:
Almost certainly the slowest part of your pipeline there is the
ESJsonEncoder. What does the throughput look like if you replace
that with a PayloadEncoder?
-r
On 09/22/2015 09:33 PM, Andre wrote:
Hi All,
I was doing some performance tests around hekad with a simple test
KVM VM running ubuntu,
2GB
4 core VM
magnetic drives
The test pipeline was the traditional vanilla pipeline similar to:
TcpInput -> TokenSplitter -> ESJsonEncoder -> FileOutput
The sample used was a stream of plain text TCP syslog generated by
syslog-ng's loggen tool.
Good news is that there's no message loss, bad news is that
performance
is somewhat lacklustre, while a normal rsyslogd doing similar
work would
do the same job at significantly higher rates, hekad was kind of
stuck
around 12K EPS.
I started setting maxproc to 4, then tried playing with the poolsize
(setting it to large value like 50000) but the only consequence
of that
was heka consuming more memory, EPS continue more or less the same.
Has anyone exceeded this performance with hekad under similar
pipelines
(i.e. TcpInput) and HW conditions (small VMs)?
If yes, mind sharing a bit on how the hekad was configured?
Kind regards
_______________________________________________
Heka mailing list
[email protected] <mailto:[email protected]>
https://mail.mozilla.org/listinfo/heka
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka