Hi,

Comments inline,

On Thu, Aug 27, 2015 at 1:29 AM, Rob Miller <[email protected]> wrote:

> On 08/25/2015 02:16 PM, Giordano, J C. wrote:
>
>> Heka community:
>>
>> I would like to share my experiences with using Heka to parse Apache log
>> files for insertion into InfluxDB.  My initial testing & configuration
>> started with an out of the box configuration consisting of:
>>
>> 1) Heka (v0.11) & InfluxDB (v0.9.3) both running on a single server:
>> Ubuntu 14.04.2 LTS, trusty
>> 2) A single Apache access log that I read from the local file system
>> having ~ 850K log entries.
>> 3) A Heka configuration of: LogstreamerInput -> SandboxDecoder:
>> apache_access.lua ->  SandboxEncoder: schema_influx_line.lua ->
>> HttpOutput to InfluxDB
>>
>> The performance of this configuration was unsuitable for production
>> taking over 12 full hours to complete the processing of a single log
>> file.  In comparison to using a LogOutput which completed in
>> approximately 3 minutes it was clear I needed to batch write records to
>> InfluxDB.
>>
> Yup, clearly unacceptable.
>
>> My initial attempt to batch records via Lua was a weak effort and
>> ultimately unsuccessful.  Attempting to queue records into a Lua table
>> (likely the incorrect approach)
>>
> If I were to do this, I'd encode the lines right in the filter, so the
> filter periodically emits a message payload where each line is an InfluxDB
> line. As you can see from looking at the encoder (
> https://github.com/mozilla-services/heka/blob/dev/sandbox/lua/encoders/schema_influx_line.lua#L142),
> all of the hard work is done in a reusable module. You'd just call
> `add_to_payload` in the process_message function and then `inject_payload`
> in the timer_event function. Then you'd just use a PayloadEncoder w/ the
> HttpOutput.
>

This is the approach we've taken in our project [1] and it's working great
so far. We batch the writes every 100 points or after a configurable
time-out. I can't comment on the memory issue but we're not sending that
many points.

Having our own filter was also necessary because we manipulate messages
with different formats which means that the encoding into InfluxDB payload
is done a bit differently depending on the type of the incoming message.

[1]
https://github.com/stackforge/fuel-plugin-lma-collector/blob/master/deployment_scripts/puppet/modules/lma_collector/files/plugins/filters/influxdb_accumulator.lua


> lead to out of memory errors by the Lua
>> sandbox for batch sizes exceeding ~200 messages.
>>
> For sizeable batches you'd need to bump the memory_limit setting. You also
> might want to increase the instruction_limit and output_limit values. These
> are all covered in the "Common Sandbox Parameters" documentation:
> http://hekad.readthedocs.org/en/v0.10.0b1/config/common_sandbox_parameter.html#config-common-sandbox-parameters
>
>> Moreover, my
>> HttpOutput then started generating timeout errors communicating with
>> InfluxDB.
>>
> Not sure what was causing this.
>
>>   Not being well versed in Lua & having to develop in a sandbox
>> environment without the aid of any meaningful logging capabilities, this
>> approach was way too unproductive for me to continue developing or
>> debugging further.
>>
> Understood.
>
>> My second attempt uses a native InfluxDB output plugin I created that is
>> based on the existing ElasticSearchOutput plugin having the ability to
>> batch write records via HTTP.  Changing HttpOutput in the above initial
>> configuration to this new plugin has altered the performance
>> dramatically.   I’m now able to process a single Apache access log in ~
>> 4 minutes.  And, I’ve loaded 31 days of historical Apache logs through
>> Heka -> InfluxDB in under 2 hours.  The number of records I’ve imported
>> exceeds 36 million for each of three distinct time series for a total
>> sum exceeding 108 million records.  The performance of this has far
>> exceeded our expectations and we are now running Heka on a production
>> server.  There’s no appreciable CPU load for doing this and we’re able
>> to write directly to InfluxDB, thus eliminating the need for log
>> shippers to a central server as was required with Logstash.
>>
> I'm very glad you've got a solution that has exceeded your expectations.
> :) Clearly, though, you had to work way too hard to do so.
>
> If you're willing to work with me, I'd be very interested in finding out
> what sort of throughput you'd get if you used a batching filter like the
> one I described above along with the HttpOutput. I'd be happy to provide
> you with the source code for such a filter, and to help with the
> configuration to make sure it all works as desired.
>

We don't send a huge number of metrics at the same time as you do (metrics
are mostly sent to Heka by collectd) so write performances are ok so far.
Using the native InfluxDB encoder, we had issues because most of the time,
one Heka message equals one InfluxDB data point for us. IIRC, writing a
single point took about 1ms and as a result, the InfluxDB output plugin
became the bottleneck slowing down the pipeline and all the other plugins.
Using the accumulator filter, writing 100 points takes about 10ms and the
pipeline is doing ok.

HTH,

Simon


> I have three requests:
>>
>> 1)  I would greatly appreciate having a native InfluxDB output plugin
>> included with future releases of Heka and would like to contribute my
>> work for your review and consideration.
>>
> InfluxDB is widely used enough that I'm open to considering a native
> output plugin, if that's really the only way we can achieve what we want in
> terms of ease-of-use and performance. That's a last resort to me, though. I
> think it's worth experimenting a bit more to see if we can hit our goals
> without it. If the batching filter I describe above works, we can add that
> to the core and we'd need much less new code.
>
>> Whether a separate plugin
>> exists for ElasticSearch/InfluxDB or whether a generalized
>> BatchHttpOutput plugin emerges is worth considering.
>>
> A BatchHttpOutput that works for both ES and InfluxDB is much more
> attractive to me than separate plugins dedicated to each.
>
>> The difference
>> between the ElasticSearchOutput plugin and my modified InfluxDB plugin
>> is largely minimal.  First, the ElasticSearch plugin assumes a fixed
>> endpoint (/_bulk) whereas InfluxDB relies on a query string.  Second,
>> ElasticSearch returns a JSON response whereas InfluxDB returns an HTTP
>> status of 204 - no content.  Both ElasticSearch & InfluxDB support TLS &
>> UDP though I’ve not tested either of these features with InfluxDB.
>> Differences beyond these are minor.
>>
> Ideally UDP would be handled by a different output, having both protocols
> adds a lot of IMO unhelpful complexity to the ES output. If the batching at
> the filter level works well, that will work just as well with UdpOutput as
> with HttpOutput.
>
>> 2) I’ve found one problem with my output plugin that appears unrelated
>> to my changes or InfluxDB and most likely exists for ElasticSearch as
>> well.
>>
>> While using the LogstreamerInput to read a single file & using my
>> InfluxOutput (c.f. ElasticSearchOutput) with: 'use_buffering = true’,
>> everything works fine.
>>
>> When using the LogstreamerInput to read multiple files having a file
>> match pattern/priority I have to turn off buffering or I receive the
>> following errors:
>>
>> 2015/08/24 14:56:43 Diagnostics: 1 packs have been idle more than 120
>> seconds.
>> 2015/08/24 14:56:43 Diagnostics: (input) Plugin names and quantities
>> found on idle packs:
>>
> Are there any subsequent lines that tell you which plugins have the idle
> pack?
>
>>  From a previous discussion, it would appear there’s a deadlock
>> occurring.  Please advise on how to debug this further.
>>
> Hrm, this is confusing. The LogstreamerInput code and the router layer
> buffering code have absolutely nothing to do with each other. I don't
> understand how this error could be related to an interaction between those
> two settings. I don't have any debugging suggestions (other than looking at
> the surrounding log lines, per my question above), but if you open an issue
> with a way I can reproduce the error I'd be happy to take a look.
>
>> 3)  While attempting to develop customizations via the Lua Sandbox, the
>> only practical logging facility I could use was: add_to_payload().  But,
>> that was out of scope from lua_modules/.
>>
> Hrm, surprising. You should be able to get to the standard API functions
> even from within modules. Maybe you were in a module that had excluded the
> global namespace?
>
>> I would like to know how best
>> to relax the sandbox restrictions to gain access to the Lua IO library
>> for being able to capture output to stdio/log files.
>>
> The sandbox initialization parameters for decoder, filter, and encoder
> plugins can be found here:
>
>
> https://github.com/mozilla-services/heka/blob/dev/sandbox/lua/lua_sandbox.go.in#L53
>
> You can temporarily allow blocked entries by editing that and rebuilding.
> If you remove `'print'` from the list on line 60, for instance, you'll be
> able to use print in your code.
>
>> Or, in general
>> what advise do you offer on how best to develop/debug code via the Lua
>> Sandbox?
>>
> Usually I can get enough debugging context just by returning errors with
> error messages, or emitting messages with debug output using inject_message
> or inject_payload.
>
>> Thanks,
>>
> Thank you, hope this was helpful. Hopefully you'll be willing to try out
> the batch influx filter...
>
> -r
>
> _______________________________________________
> Heka mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/heka
>
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Reply via email to