chewbranca opened a new pull request, #4483:
URL: https://github.com/apache/couchdb/pull/4483
<!-- Thank you for your contribution!
Please file this form by replacing the Markdown comments
with your text. If a section needs no action - remove it.
Also remember, that CouchDB uses the Review-Then-Commit (RTC) model
of code collaboration. Positive feedback is represented +1 from
committers
and negative is a -1. The -1 also means veto, and needs to be addressed
to proceed. Once there are no objections, the PR can be merged by a
CouchDB committer.
See: http://couchdb.apache.org/bylaws.html#decisions for more info. -->
## Overview
<!-- Please give a short brief for the pull request,
what problem it solves or how it makes things better. -->
This PR introduces a mechanism for logging metadata about requests and
system activity outside of the normal http log entry. In particular, this work
is meant to provide tooling for logging things like the quantity of work
performed to fulfill a particular request.
For example, given we make a `_find` query like so:
```
(chewbranca)-(jobs:0)-(~)
(! 2793)-> curl -u adm:pass -X POST
localhost:15984/foo/_find?include_docs=true -d '{"selector":{"blah":"blah
blah"},"use_index":"_design/67ca9ba1e20371ee9d7e703c1f04eb1d03521b57","execution_stats":true}'
-H content-type:application/json
{"docs":[
{"_id":"493609ddc70fa41dec7ec28fc3000bf0","_rev":"1-6440c84fecde0669be847bf9639110b5","blah":"blah
blah"},
...*SNIP*...
{"_id":"7ea025f22312db0d494f7a3d0f2614d4","_rev":"1-6440c84fecde0669be847bf9639110b5","blah":"blah
blah"},
{"_id":"7ea025f22312db0d494f7a3d0f2622ee","_rev":"1-6440c84fecde0669be847bf9639110b5","blah":"blah
blah"}
],
"bookmark":
"g1AAAABfeJzLYWBgYMpgSmHgKy5JLCrJTq2MT8lPzkzJBYormKcmGhiZphkZGRsapSQZpJhYmqSZJxqnGKQZmRkZpaaC9HHA9OUAdTCCtHEm5SRmKICIrCwAuMMZsw",
"execution_stats":
{"total_keys_examined":0,"total_docs_examined":1243,"total_quorum_docs_examined":0,"results_returned":1243,"execution_time_ms":190.988}}
```
This PR will generate a log entry report of the form:
> [report] 2023-03-20T23:39:03.763470Z [email protected] <0.7071.0> ecded843da
mango-stats [type="mango-stats" total_quorum_docs_examined="0"
total_keys_examined="0" total_docs_examined="1243" results_returned="1243"
nonce="ecded843da" execution_time_ms="190.988"]
The report metadata is logged in format compliant with
[RFC5424](https://www.rfc-editor.org/rfc/rfc5424.html#section-6.3) such that
syslog can pickup the metadata entries. This allows us to extend what metadata
is introduced over time in such a way that parsing at the log/syslog level will
not need further modifications for new data. This means we can easily introduce
additional data logging around volume or IO operations or other statistics at a
lower granularity than the doc level.
It could be argued that this data should be directly coupled with the
underlying http log, and I offer two counterpoints:
1) currently `mango_crud:find/5` returns an accumulated response, rather
than a request handle, so we lose the metadata generated inside `mango` when it
passes back into the `chttpd` layer, hence `chttpd` no longer has the relevant
stats to include in the log when we finally log the post http request entry in
`chttpd`. We _could_ modify the relevant code paths to introduce a data
structure containing both the response and metadata values allowing us to
funnel data to be logged up from lower levels in the system than chttpd; I
think this is potentially worthwhile, but I also think item 2) below makes this
less of an immediate concern.
2) I believe strongly we need the ability to collect data about the
overhead performed by a particular task and then log that data such that we can
more clearly articulate what CouchDB is doing as a function of time. This is
certainly incredibly important for http requests that generate load to
understand expensive requests, but also just as important for background
operations as we have some blind spots in the amount of `couch_btree:foldl`
operations induced per operation, be it a view/_all_docs or a background job
like compaction of view building. Therefore we have areas to generate these
report logs where there's no existing http request context to hook into.
Furthermore, I think it's worthwhile to be able to record stats along these
lines on the RPC side rather than funneling all the data back to the
coordinator to do the logging there.
I do think there's merit to 1) above, however, I think 2) necessitates
report logging outside of an http context so I'm not convinced we should do 1)
as part of this PR. That said, feedback is welcome.
<hr />
The current structure of this PR is functional and allows for a hybrid
approach for the report logging such that the RFC5424 metadata can be
introduced _along with_ an additional arbitrary string message afterwards.
While working through the logic here, in practice, supporting the hybrid
approach complicates the underlying logic in the codebase and also seems to
complicate the ability for RFC5424 parsers to extract the metadata and parse
the log entry. For example, here's a [RFC5424 parser Splunk
plugin](https://splunkbase.splunk.com/app/978) I've been looking at and one of
the caveats is:
> The MSG section of the event, if it exists, is not parsed by this app.
and a sample message from RFC5424 demonstrating structured metadata followed
by a string `MSG`:
```
<165>1 2003-10-11T22:14:15.003Z mymachine.example.com
evntslog - ID47 [exampleSDID@32473 iut="3" eventSource=
"Application" eventID="1011"] BOMAn application
event log entry...
```
So it's a valid log entry to introduce structured metadata _and also_ follow
that with an arbitrary log `MSG` string, but like I said this seems to
complicate a lot of things both from a parsing standpoint and the logistics of
short circuiting empty messages and format strings through
`couch_log_formatter`. I'm inclined to remove this functionality, especially
since it feels that in practice, the most useful thing here is logging the
metadata block, rather than an additional `MSG` payload.
I wanted to get broader feedback before going down that route of removing
the additional MSG, so the PR is in the current form outlined above, but I've
got an initial diff below to drop that functionality. This would also cleanup
all the additional logic introduced by `couch_log:log/3 --> couch_log:log/5`,
and so my preferred direction for this PR to go is to apply this diff and then
drop out all support for metadata logging outside of a very specifically
targeted `couch_log:report`. Here's the wip diff:
```diff
diff --git a/src/couch_log/src/couch_log.erl
b/src/couch_log/src/couch_log.erl
index 379326b52..2fb5808bb 100644
--- a/src/couch_log/src/couch_log.erl
+++ b/src/couch_log/src/couch_log.erl
@@ -22,7 +22,6 @@
alert/2,
emergency/2,
report/2,
- report/4,
set_level/1
]).
@@ -51,11 +50,7 @@ alert(Fmt, Args) -> log(alert, Fmt, Args).
emergency(Fmt, Args) -> log(emergency, Fmt, Args).
-spec report(string(), map()) -> ok.
-report(ReportId, Meta) when is_map(Meta) ->
- report(ReportId, "", [], Meta).
-
--spec report(string(), string(), list(), map()) -> ok.
-report(ReportId, Fmt, Args, Meta0) when is_map(Meta0) ->
+report(ReportId, Meta0) when is_map(Meta0) ->
Meta = case maps:is_key(type, Meta0) of
true ->
Meta0;
@@ -63,7 +58,7 @@ report(ReportId, Fmt, Args, Meta0) when is_map(Meta0) ->
Meta0#{type => ReportId}
end,
couch_stats:increment_counter([couch_log, level, report]),
- Entry = couch_log_formatter:format(report, self(), ReportId, Fmt, Args,
Meta),
+ Entry = couch_log_formatter:format_report(self(), ReportId, Meta),
ok = couch_log_server:log(Entry).
-spec set_level(atom() | string() | integer()) -> true.
diff --git a/src/couch_log/src/couch_log_formatter.erl
b/src/couch_log/src/couch_log_formatter.erl
index bf7571975..5b23f7ed5 100644
--- a/src/couch_log/src/couch_log_formatter.erl
+++ b/src/couch_log/src/couch_log_formatter.erl
@@ -22,6 +22,8 @@
format/3,
format/1,
+ format_report/3,
+
format_reason/1,
format_mfa/1,
format_trace/1,
@@ -32,6 +34,28 @@
-define(DEFAULT_TRUNCATION, 1024).
+-define(REPORT_LEVEL, report).
+
+format_report(Pid, Type, Meta) ->
+ MaxMsgSize = couch_log_config:get(max_message_size),
+ Msg = case format_meta(Meta) of
+ "" ->
+ "";
+ MetaStr when length(MetaStr) > MaxMsgSize ->
+ %% TODO: what to do when meta formatted data is too large?
+ error(what_to_do_here);
+ MetaStr ->
+ ["[", MetaStr, "]"]
+ end,
+ #log_entry{
+ level = couch_log_util:level_to_atom(?REPORT_LEVEL),
+ pid = Pid,
+ msg = Msg,
+ msg_id = couch_log_util:get_msg_id(),
+ time_stamp = couch_log_util:iso8601_timestamp(),
+ type = Type
+ }.
+
format(Level, Pid, Fmt, Args) ->
format(Level, Pid, undefined, Fmt, Args, #{}).
```
## Testing recommendations
<!-- Describe how we can test your changes.
Does it provide any behaviour that the end users
could notice? -->
## Related Issues or Pull Requests
<!-- If your changes affect multiple components in different
repositories please put links to those issues or pull requests here.
-->
## Checklist
- [ ] Code is written and works correctly
- [ ] Changes are covered by tests
- [ ] Any new configurable parameters are documented in
`rel/overlay/etc/default.ini`
- [ ] Documentation changes were made in the `src/docs` folder
- [ ] Documentation changes were backported (separated PR) to affected
branches
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]