chewbranca opened a new pull request, #4483:
URL: https://github.com/apache/couchdb/pull/4483

   <!-- Thank you for your contribution!
   
        Please file this form by replacing the Markdown comments
        with your text. If a section needs no action - remove it.
   
        Also remember, that CouchDB uses the Review-Then-Commit (RTC) model
        of code collaboration. Positive feedback is represented +1 from 
committers
        and negative is a -1. The -1 also means veto, and needs to be addressed
        to proceed. Once there are no objections, the PR can be merged by a
        CouchDB committer.
   
        See: http://couchdb.apache.org/bylaws.html#decisions for more info. -->
   
   ## Overview
   
   <!-- Please give a short brief for the pull request,
        what problem it solves or how it makes things better. -->
   
   This PR introduces a mechanism for logging metadata about requests and 
system activity outside of the normal http log entry. In particular, this work 
is meant to provide tooling for logging things like the quantity of work 
performed to fulfill a particular request.
   
   For example, given we make a `_find` query like so:
   
   ```
   (chewbranca)-(jobs:0)-(~)                                                    
                                                                                
                                                                                
                                                                               
   (! 2793)-> curl  -u adm:pass -X POST 
localhost:15984/foo/_find?include_docs=true -d '{"selector":{"blah":"blah 
blah"},"use_index":"_design/67ca9ba1e20371ee9d7e703c1f04eb1d03521b57","execution_stats":true}'
 -H content-type:application/json                                               
                               
   {"docs":[                                                                    
                                                                                
                                                                                
                                                                               
   
{"_id":"493609ddc70fa41dec7ec28fc3000bf0","_rev":"1-6440c84fecde0669be847bf9639110b5","blah":"blah
 blah"},                                                                        
                                                                                
                                                          
   ...*SNIP*...                                                                 
                                                                                
                                                                                
                                                                               
   
{"_id":"7ea025f22312db0d494f7a3d0f2614d4","_rev":"1-6440c84fecde0669be847bf9639110b5","blah":"blah
 blah"},                                                                        
                                                                                
                                                          
   
{"_id":"7ea025f22312db0d494f7a3d0f2622ee","_rev":"1-6440c84fecde0669be847bf9639110b5","blah":"blah
 blah"}                                                                         
                                                                                
                                                          
   ],                                                                           
                                                                                
                                                                                
                                                                               
   "bookmark": 
"g1AAAABfeJzLYWBgYMpgSmHgKy5JLCrJTq2MT8lPzkzJBYormKcmGhiZphkZGRsapSQZpJhYmqSZJxqnGKQZmRkZpaaC9HHA9OUAdTCCtHEm5SRmKICIrCwAuMMZsw",
                                                                                
                                                                                
               
   "execution_stats": 
{"total_keys_examined":0,"total_docs_examined":1243,"total_quorum_docs_examined":0,"results_returned":1243,"execution_time_ms":190.988}}
                                                                                
                                                                                
 
   ```
   
   This PR will generate a log entry report of the form:
   
   > [report] 2023-03-20T23:39:03.763470Z [email protected] <0.7071.0> ecded843da 
mango-stats [type="mango-stats" total_quorum_docs_examined="0" 
total_keys_examined="0" total_docs_examined="1243" results_returned="1243" 
nonce="ecded843da" execution_time_ms="190.988"]
   
   The report metadata is logged in format compliant with 
[RFC5424](https://www.rfc-editor.org/rfc/rfc5424.html#section-6.3) such that 
syslog can pickup the metadata entries. This allows us to extend what metadata 
is introduced over time in such a way that parsing at the log/syslog level will 
not need further modifications for new data. This means we can easily introduce 
additional data logging around volume or IO operations or other statistics at a 
lower granularity than the doc level.
   
   It could be argued that this data should be directly coupled with the 
underlying http log, and I offer two counterpoints:
   
     1) currently `mango_crud:find/5` returns an accumulated response, rather 
than a request handle, so we lose the metadata generated inside `mango` when it 
passes back into the `chttpd` layer, hence `chttpd` no longer has the relevant 
stats to include in the log when we finally log the post http request entry in 
`chttpd`. We _could_ modify the relevant code paths to introduce a data 
structure containing both the response and metadata values allowing us to 
funnel data to be logged up from lower levels in the system than chttpd; I 
think this is potentially worthwhile, but I also think item 2) below makes this 
less of an immediate concern.
   
     2) I believe strongly we need the ability to collect data about the 
overhead performed by a particular task and then log that data such that we can 
more clearly articulate what CouchDB is doing as a function of time. This is 
certainly incredibly important for http requests that generate load to 
understand expensive requests, but also just as important for background 
operations as we have some blind spots in the amount of `couch_btree:foldl` 
operations induced per operation, be it a view/_all_docs or a background job 
like compaction of view building. Therefore we have areas to generate these 
report logs where there's no existing http request context to hook into. 
Furthermore, I think it's worthwhile to be able to record stats along these 
lines on the RPC side rather than funneling all the data back to the 
coordinator to do the logging there.
   
   I do think there's merit to 1) above, however, I think 2) necessitates 
report logging outside of an http context so I'm not convinced we should do 1) 
as part of this PR. That said, feedback is welcome.
   
   <hr />
   
   The current structure of this PR is functional and allows for a hybrid 
approach for the report logging such that the RFC5424 metadata can be 
introduced _along with_ an additional arbitrary string message afterwards. 
While working through the logic here, in practice, supporting the hybrid 
approach complicates the underlying logic in the codebase and also seems to 
complicate the ability for RFC5424 parsers to extract the metadata and parse 
the log entry. For example, here's a [RFC5424 parser Splunk 
plugin](https://splunkbase.splunk.com/app/978) I've been looking at and one of 
the caveats is:
   
   > The MSG section of the event, if it exists, is not parsed by this app.
   
   and a sample message from RFC5424 demonstrating structured metadata followed 
by a string `MSG`:
   
   ```
              <165>1 2003-10-11T22:14:15.003Z mymachine.example.com
              evntslog - ID47 [exampleSDID@32473 iut="3" eventSource=
              "Application" eventID="1011"] BOMAn application
              event log entry...
   ```
   
   So it's a valid log entry to introduce structured metadata _and also_ follow 
that with an arbitrary log `MSG` string, but like I said this seems to 
complicate a lot of things both from a parsing standpoint and the logistics of 
short circuiting empty messages and format strings through 
`couch_log_formatter`. I'm inclined to remove this functionality, especially 
since it feels that in practice, the most useful thing here is logging the 
metadata block, rather than an additional `MSG` payload.
   
   I wanted to get broader feedback before going down that route of removing 
the additional MSG, so the PR is in the current form outlined above, but I've 
got an initial diff below to drop that functionality. This would also cleanup 
all the additional logic introduced by `couch_log:log/3 --> couch_log:log/5`, 
and so my preferred direction for this PR to go is to apply this diff and then 
drop out all support for metadata logging outside of a very specifically 
targeted `couch_log:report`. Here's the wip diff:
   
   ```diff
   diff --git a/src/couch_log/src/couch_log.erl 
b/src/couch_log/src/couch_log.erl
   index 379326b52..2fb5808bb 100644
   --- a/src/couch_log/src/couch_log.erl
   +++ b/src/couch_log/src/couch_log.erl
   @@ -22,7 +22,6 @@
        alert/2,
        emergency/2,
        report/2,
   -    report/4,
        set_level/1
    ]).
    
   @@ -51,11 +50,7 @@ alert(Fmt, Args) -> log(alert, Fmt, Args).
    emergency(Fmt, Args) -> log(emergency, Fmt, Args).
    
    -spec report(string(), map()) -> ok.
   -report(ReportId, Meta) when is_map(Meta) ->
   -    report(ReportId, "", [], Meta).
   -
   --spec report(string(), string(), list(), map()) -> ok.
   -report(ReportId, Fmt, Args, Meta0) when is_map(Meta0) ->
   +report(ReportId, Meta0) when is_map(Meta0) ->
        Meta = case maps:is_key(type, Meta0) of
            true ->
                Meta0;
   @@ -63,7 +58,7 @@ report(ReportId, Fmt, Args, Meta0) when is_map(Meta0) ->
                Meta0#{type => ReportId}
        end,
        couch_stats:increment_counter([couch_log, level, report]),
   -    Entry = couch_log_formatter:format(report, self(), ReportId, Fmt, Args, 
Meta),
   +    Entry = couch_log_formatter:format_report(self(), ReportId, Meta),
        ok = couch_log_server:log(Entry).
    
    -spec set_level(atom() | string() | integer()) -> true.
   diff --git a/src/couch_log/src/couch_log_formatter.erl 
b/src/couch_log/src/couch_log_formatter.erl
   index bf7571975..5b23f7ed5 100644
   --- a/src/couch_log/src/couch_log_formatter.erl
   +++ b/src/couch_log/src/couch_log_formatter.erl
   @@ -22,6 +22,8 @@
        format/3,
        format/1,
    
   +    format_report/3,
   +
        format_reason/1,
        format_mfa/1,
        format_trace/1,
   @@ -32,6 +34,28 @@
    
    -define(DEFAULT_TRUNCATION, 1024).
    
   +-define(REPORT_LEVEL, report).
   +
   +format_report(Pid, Type, Meta) ->
   +    MaxMsgSize = couch_log_config:get(max_message_size),
   +    Msg = case format_meta(Meta) of
   +        "" ->
   +            "";
   +        MetaStr when length(MetaStr) > MaxMsgSize ->
   +            %% TODO: what to do when meta formatted data is too large?
   +            error(what_to_do_here);
   +        MetaStr ->
   +            ["[", MetaStr, "]"]
   +    end,
   +    #log_entry{
   +        level = couch_log_util:level_to_atom(?REPORT_LEVEL),
   +        pid = Pid,
   +        msg = Msg,
   +        msg_id = couch_log_util:get_msg_id(),
   +        time_stamp = couch_log_util:iso8601_timestamp(),
   +        type = Type
   +    }.
   +
    format(Level, Pid, Fmt, Args) ->
        format(Level, Pid, undefined, Fmt, Args, #{}).
   ```
   
   ## Testing recommendations
   
   <!-- Describe how we can test your changes.
        Does it provide any behaviour that the end users
        could notice? -->
   
   ## Related Issues or Pull Requests
   
   <!-- If your changes affect multiple components in different
        repositories please put links to those issues or pull requests here.  
-->
   
   ## Checklist
   
   - [ ] Code is written and works correctly
   - [ ] Changes are covered by tests
   - [ ] Any new configurable parameters are documented in 
`rel/overlay/etc/default.ini`
   - [ ] Documentation changes were made in the `src/docs` folder
   - [ ] Documentation changes were backported (separated PR) to affected 
branches
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to