All sounds good to me.

For timing data I meant the capture of the actual time value used to send down 
to metrics collection, I think this would require what you mention as a hybrid 
approach which is a fine starting point.

Thanks!
Tyson


On Sep 28, 2017, at 4:45 AM, Vadim Raskin 
<[email protected]<mailto:[email protected]>> wrote:

Hi Tyson,

Re: logging.
Akka logging is indeed async, but the performance of the whole actor system
depends on the attached logging adapter. We didn't dig into the details of
current log4j adapter, since we quickly replaced it with logback with Async
backend. It's performance was dependent on the queue size, if you are able
to make it full, the next requests to add new messages are blocking. You
are right logback should drop messages if 80% of the queue is reached, but
I doubt that accidentally dropping some logs is a good practice, don't
think we are going to leave that default.


Re: Kamon.
If you mean timing data in terms of statsd format, one could send them
using either histogram or gauge type in Kamon. However, histogram would be
more preferable I think, since gauge sent only the latest value is being
received in between of two flush events afaik.

I don't mind first to have a hybrid approach using the measurement from
TransactionId and pass those data to Kamon then. The main point here is to
reduce the number of string interpolations and IO operations, thus it
should be good enough.

Re: first steps.

I think it would make sense to proceed with the following steps:

- replace log4j with Logback. (could be orthogonal to the main problem, but
nice to have the logger recommended by Akka)
- Get rid of log markers or make them optional. If we have an efficient way
to add new metrics, we could replace some logs with metrics.
- reduce the number of logs.

regards, Vadim.


On ср, 27 сент. 2017 г. at 19:44 Tyson Norris 
<[email protected]<mailto:[email protected]>>
wrote:

Thanks Vadmin!

A few comments:

- RE: async logging:  I thought the akka logger was already async? This is
different than the logback async appender. I’m OK with using the async
appender, but people should know that it by default is configured to ignore
events once its buffer is full, so its possible to get into a situation
where log messages “go missing” if faced with sufficient traffic.

- RE: kamon: It’s not clear to me that kamon supports timing data for
general methods, although it looks like it keeps timing data for akka
message processing.
Currently the log markers in OW keep timing data for arbitrary blocks of
code (unrelated to akka). It looks like to get “time in method” metrics you
will still need to measure the time, and then record that time using kamon.

I know Sandeep is looking into using the timing/tracing aspect as part of
this PR 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-openwhisk%2Fpull%2F2282&data=02%7C01%7C%7C9c1f88e30cce4ee4b9e508d50666752b%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636421959483500369&sdata=beEuCVDfZO7ubvc5AfShLeVS05OvC4SEXekoGlg5aj8%3D&reserved=0.

It may be interesting to consider kamon TraceContext for this, but it is
also somewhat invasive - seems like it requires wrapping all Future usage
as in:
Tracer.withNewContext("sample-trace") {
   Future {}
}

Is there a way to use this without this wrapping?

BTW - I agree that any “counter” type metrics should be replaced and kamon
seems good for this, I’m just not sure about the timing markers.

- RE: first steps: Does changing log library or appender config imply the
need to change the current marker/timing impl? I think a few logical steps
are:
- replace logging impl (with logback)
- reduce log generation
- replace log marker counter impl (with kamon?)
- replace logging approach (with TraceContext or similar, if an option
even exists, with the goal being to NOT pass whisk Logging around
everywhere)

(I’m really not sure about the last one, but it would be nice to a less
invasive way to emit logs with context)

Thanks
Tyson






On Sep 27, 2017, at 7:39 AM, Vadim Raskin 
<[email protected]<mailto:[email protected]><mailto:
[email protected]<mailto:[email protected]>>> wrote:

Hi OpenWhisk community,

while doing our profiling tests on the controller we have found that the
way we log and the logging per se make a huge impact on the controller
thoughput. In a short, the more parallel activations you have, the more
blocking threads waiting for the logger will be there. After doing several
experiments we found a couple of action items:
* replace the logging library (e.g. to logback), use an async appender and
increase the size of the LogAppender buffer.
* reduce the number of logs that we generate.

Since the first action item will only provide a partial relief and postpone
the point when we start observing the degradation, reducing the number of
logs seems to be a more plausible way to go.

One of the first points I’d like to address are the metrics that we append
to log messages in a form of markers.

The initial idea is to send them directly to statsd or some other backend.
For example Kamon library is a good candidate:
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fkamon.io&data=02%7C01%7C%7Cfa62bc6102964a52294908d505b59165%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636421199745973146&sdata=K8azfpdoqTeRzQrhB1LhBY76z5DrTSGyg7DuBmLMJZ8%3D&reserved=0

Here is the issue I’ve opened some time ago.

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-openwhisk%2Fissues%2F2780&data=02%7C01%7C%7Cfa62bc6102964a52294908d505b59165%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636421199745973146&sdata=QTe%2BNE3YuQpULGcZit%2FNPMAZ0EU9oJ56Yojgyrrs3rU%3D&reserved=0

Do you have any suggestion/concerns regarding that?

Thank you.

regards, Vadim.

Reply via email to