I think this is a great idea overall, particularly given the number of significant changes that are happening in the codebase between 3.0 and 4.0.
For me the main question is how much overhead is associated with tracing. Can an admin safely configure it to run in production? Is it possible to sample just a small percentage of events? Does the overhead change if no OpenTracing tracer is configured? I also think a full picture here might include the ability to inherit client-provided spans, so an app developer could drill down from her own code into the database internals and figure out why some DB request was unexpectedly slow. Thanks for starting this discussion. Cheers, Adam > On Sep 10, 2019, at 2:32 PM, Ilya Khlopotov <iil...@apache.org> wrote: > > Hi, > > I wanted to run this idea by the ML to see if there is any interest before > investing time into preparing formal RFC. > > # Problem statement > > Collecting profiling data is very tricky at the moment. Developers have to > run generic profiling tools which are not aware of CouchDB specifics. > This makes it hard to do the performance optimization work. We need a tool > which would allow us to get profiling data from specific points in the > codebase. This means code instrumentation. > > # Proposed solution > > There is an https://opentracing.io/ project, which is a vendor-neutral APIs > and instrumentation for distributed tracing. In Erlang it is implemented by > https://github.com/Bluehouse-Technology/otter library. The library provides a > nice abstraction to start/finish tracing spans as well as adding tags and log > entries to a given span. In the context of CouchDB this means that we can do > something like the following: > - start tracing span on every HTTP request > - add tags to capture additional information such as "database name"/"name of > endpoint"/"nonce" > - add otter logs in critical parts of the codebase to get profiling data for > these points. > > The otter is the most useful in combination with [zipkin](https://zipkin.io/) > compatible server such as [jaeger](https://github.com/jaegertracing/jaeger). > However it can be used even without zipkin. It has a configurable set of > counters, which makes it possible to get answers on questions like: > - what kind of requests are slow > - if we get a slow request (taking longer then configured threshold) what was > the trace (annotated with time spent between trace points) > - which function in the trace taking the most time > > # Otter API > > The otter has multiple APIs which we would choose on a case by case basis: > - functional API - the span structure need to be passed around (we could > extend `#http{}`/`#user_ctx{}`/`#db{}`) > - simple process dictionary API - the span data are stored in the process > dictionary > - Multiple span process dictionary API - supports multiple spans per process > - Span id API - starts a process per span > > # Roadblocks > > One of the problems we would need to solve is to how to do multiple nodes > spans. We would need this functionality to trace the request from the HTTP > endpoint handler running on coordinator to the shard updater process running > on the storage nodes. > > We could use either: > - extend rexi or fabric to pass and aggregate span information > - pass span info explicitly in every fabric function. > > # Quick demo (warning very technical content) > > The goal of this demo is to demonstrate the value of otter without zipkin > server. > > ``` > diff --git a/rebar.config.script b/rebar.config.script > index c38b6e235..c2b162751 100644 > --- a/rebar.config.script > +++ b/rebar.config.script > @@ -129,6 +129,11 @@ OptionalDeps = case WithProper of > [] > end, > > +ManualDeps = [ > + {otter, {url, "https://github.com/Bluehouse-Technology/otter"}, {branch, > "master"}}, > + {otter_lib, {url, "https://github.com/Bluehouse-Technology/otter_lib"}, > {branch, "master"}} > +], > + > BaseUrl = "https://github.com/apache/", > > MakeDep = fun > @@ -152,7 +157,7 @@ end, > AddConfig = [ > {require_otp_vsn, "19|20|21|22"}, > {deps_dir, "src"}, > - {deps, lists:map(MakeDep, DepDescs ++ OptionalDeps)}, > + {deps, lists:map(MakeDep, DepDescs ++ OptionalDeps ++ ManualDeps)}, > {sub_dirs, SubDirs}, > {lib_dirs, ["src"]}, > {erl_opts, [{i, "../"} | ErlOpts]}, > diff --git a/src/chttpd/src/chttpd.erl b/src/chttpd/src/chttpd.erl > index 1e1d638be..a7aad5010 100644 > --- a/src/chttpd/src/chttpd.erl > +++ b/src/chttpd/src/chttpd.erl > @@ -158,6 +158,7 @@ handle_request(MochiReq0) -> > > handle_request_int(MochiReq) -> > Begin = os:timestamp(), > + otter_span_pdict_api:start_with_tags("http request", [{"begin_ts", > Begin}]), > case config:get("chttpd", "socket_options") of > undefined -> > ok; > @@ -233,6 +234,7 @@ handle_request_int(MochiReq) -> > > % put small token on heap to keep requests synced to backend calls > erlang:put(nonce, Nonce), > + otter_span_pdict_api:tag("nonce", Nonce), > > % suppress duplicate log > erlang:put(dont_log_request, true), > @@ -282,6 +284,8 @@ after_request(HttpReq, HttpResp0) -> > end, > HttpResp2 = update_stats(HttpReq, HttpResp1), > chttpd_stats:report(HttpReq, HttpResp2), > + otter_span_pdict_api:tag("status", HttpResp2#httpd_resp.status), > + otter_span_pdict_api:log("completed"), > + otter_span_pdict_api:finish(), > maybe_log(HttpReq, HttpResp2), > HttpResp2. > > diff --git a/src/fabric/src/fabric.erl b/src/fabric/src/fabric.erl > index 27fa8c045..a1972b445 100644 > --- a/src/fabric/src/fabric.erl > +++ b/src/fabric/src/fabric.erl > @@ -56,6 +56,7 @@ all_dbs() -> > %% @doc returns a list of all database names > -spec all_dbs(Prefix::iodata()) -> {ok, [binary()]}. > all_dbs(Prefix) when is_binary(Prefix) -> > + otter_span_pdict_api:tag("request_type", "_all_dbs"), > Length = byte_size(Prefix), > MatchingDbs = mem3:fold_shards(fun(#shard{dbname=DbName}, Acc) -> > case DbName of > ``` > > ``` > application:start(otter_lib). > application:start(otter). > f(Rules), Rules = [ > {[ > {greater, otter_span_duration, 10} > ],[ > {snapshot_count, [long_span], [otter_span_name]}, > send_to_zipkin > ]} > ]. > otter_config:write(filter_rules, Rules). > otter:counter_snapshot([long_span,"http request"]). > [{[long_span,"http request"], > [{snap_timestamp,{2019,9,10,13,46,43,368208}}, > {data,{span,1568123203366286,17299637839902614236, > "http request",782788946072648712,undefined, > [{"begin_ts",{1568,123203,366255}}, > {"nonce","a0a1d7c58e"}, > {"status",ok}], > [],1911}}]}] > ``` > > Unfortunately `counter_snapshot` API doesn't return log events. However this > problem is very easy to overcome. Here is the example of how to get matching > spans logged into a log file. > > ``` > diff --git a/src/chttpd/src/chttpd.erl b/src/chttpd/src/chttpd.erl > index 1e1d638be..5f2d60690 100644 > --- a/src/chttpd/src/chttpd.erl > +++ b/src/chttpd/src/chttpd.erl > @@ -33,7 +33,7 @@ > send_chunked_error/2, send_json/2,send_json/3,send_json/4, > validate_ctype/2]). > > --export([authenticate_request/3]). > +-export([authenticate_request/3, zipkin_log/2]). > > @@ -1207,6 +1216,10 @@ get_user(#httpd{user_ctx = #user_ctx{name = User}}) -> > get_user(#httpd{user_ctx = undefined}) -> > "undefined". > > +zipkin_log(_URL, EncodedSpans) -> > + Spans = otter_lib_zipkin_thrift:decode_spans(EncodedSpans), > + couch_log:error("ZIPKIN :: ~p~n", [Spans]). > + > ``` > > ``` > otter_config:write(http_client, {chttpd, zipkin_log}). > ``` > > This would produce the log events looking as the following: > ``` > [error] 2019-09-10T18:01:38.758631Z node1@127.0.0.1 <0.4737.0> -------- > ZIPKIN :: [{span,1568138498710679,3212384889493927141,<<"http > request">>,345610195655038913,undefined,[{<<"lc">>,<<>>,{<<"otter_test">>,{127,0,0,1},0}},{<<"begin_ts">>,<<"{1568,138498,710639}">>},{<<"path">>,<<"_all_dbs">>},{<<"method">>,<<"GET">>},{<<"nonce">>,<<"c225c6aef1">>},{<<"status">>,<<"ok">>}],[{1568138498712456,<<"completed">>}],1784}] > ``` > > As you can see the timestamp of the `completed` event is included in the > entry. Also, since it is a function call we can format the event in any way > we like. > > # Conclusion > > - The otter configuration is simple and presence of `http_client` parameter > allows us to use otter without zipkin server. > - The API is simple which makes it possible to replace otter with something > else if we wouldn't be happy with the implementation. > - The codebase of otter is concise and easy to re-implement in case we would > need to. > > Overall I don't think it will be too complicated to introduce basic > functionality and extend it latter when we need to. > > In case you agree with direction I would appreciate any feedback which would > help to form requirements for the RFC and PR in the future. > > Best regards, > ILYA (aka iilyak)