paul-rogers opened a new pull request #12058:
URL: https://github.com/apache/druid/pull/12058


   Provides a *response trailer* mechanism to include a map of values in the 
query response after the usual data results.
   
   ### Motivation
   
   Druid runs queries using a REST request/response protocol in which the 
response contains the query result set. There are several use cases where it 
would be useful to return additional performance data along with the result 
set. A concrete example is the [query 
profile](https://github.com/apache/drill/issues/2350) which can optionally 
return the profile when a query runs. Further, the query profile mechanism uses 
the response trailer to return a partial profile from each data node during 
execution.
   
   When not returning a full profile, it is still useful to return a small set 
of overall metrics. This PR includes only those summary metrics.
   
   Since query metrics are available, by definition, only at the completion of 
a query, it is necessary to return metrics and profile information *after* the 
data. Thus, the existing before-the-data header mechanism is not a good fit for 
this use case and we instead introduce a *response trailer* mechanism.
   
   ### Credit
   
   This PR and description are derived from a prototype which @gianm created.
   
   ### Design
   
   When running a query with a JSON response format, the response is just the 
response set as an array of rows, where the rows can be in various formats. 
When the response trailer is enabled, the JSON response format changes to an 
object which includes the result set as one of the fields.
   
   This change is clearly highly visible to clients which may expect the JSON 
array format. To avoid breaking such clients, the client must specifically 
request the revised format.
   
   #### `ResponseContext` changes
   
   The `ResponseContext` gains a new field, `metrics` to gather the summary 
metrics. The field contains the object that generates the metrics shown in the 
API below. On the Broker, the `ResponseContext` may contain metrics for 
multiple subqueries.
   
   A previous PR clearly identified those `ResponseContext` fields which are to 
appear in the response header. Those fields not in the header are considered 
internal fields. This PR extends the idea to tag keys that appear in the 
trailer. The result is that fields can appear in the header, trailer, both or 
neither. ("Neither" is useful for keys that are only used internally by query 
engines within one server, like `timeoutAt` and `count`.)  
   
   The new `metrics` key is tagged as trailer-only.
   
   #### "Starter" metrics
   
   Until the full query profile is available, this PR motivates the response 
trailer by providing some basic metrics within the `metrics` key:
   
   * `resultRows`: the number of rows returned
   * `queryStart`: Query start time
   * `queryMs`: Query duration (up until the trailer is written)
   
   In addition, the trailer contains fields which may have been truncated in 
the header:
   
   * `uncoveredIntervals`
   * `missingSegments`
   
   #### `QueryLifecycle` and `SqlLifecycle` changes
   
   The `QueryResource` and `SqlResource` classes process each query, return the 
results, log data, return the response trailer (in this PR) and, later, will 
assemble and return the profile.
   
   To better organize this work, the lifecycle classes each gain a 
`LifecycleStats` that gathers and distributes the above statistics.
   
   #### `QueryResponse`
   
   The SQL-related code to run a query consists of many layers of functions 
which take the query as input and returns a sequence. The response context is 
created several layers down in the stack. To return the trailer, we must return 
the response context up through those top layers that today return only a 
sequence. The existing `QueryResponse` is pressed into service to solve this 
issue by wrapping the output sequence and the query profile.
   
   The profile is made available in the `QueryResponse`, but it is not fully 
populated until the root `Sequence` returns all its results.
   
   #### `Query` changes
   
   Druid queries return a variety of results. The objects returned by a 
`Sequence` may not represent rows, but rather query-specific batches of rows. 
To obtain a row count, we must know how to interpret those results. The new 
`Query.getRowCountOf()` method does this work.
   
   #### `DirectDruidClient` changes
   
   The internal Druid client uses the new results-with-trailer format. It 
copies the trailer fields into the client-side response context. The 
Broker-side merge merges results from may clients into the final response 
context.
   
   ### Rationale
   
   The design does not use HTTP trailers. In theory the response context could 
have been sent using HTTP trailers. However, client and proxy support for HTTP 
trailers is spotty, and it isn't clear how much data is safe to send in a 
trailer. For these reasons, we chose to include the trailer in the regular 
response.
   
   ### API
   
   #### Request
   
   In native, include the header `X-Druid-Include-Trailer: Yes`:
   
   ```text
   POST /druid/v2/
   X-Druid-Include-Trailer: Yes
   
   { ... } // regular query
   ```
   
   In SQL, use the result format arrayWithTrailer:
   
   ```text
   POST /druid/v2/sql/
   
   {
     "query": "SELECT COUNT(*) FROM wikipedia",
     "resultFormat": "arrayWithTrailer"
   }
   ```
   
   #### Response
   
   In both native and SQL:
   
   ```json
   {
     "results": [ ... ], // normal results
     "context": { // trailer
       "metrics": [
         {
           "segments": 37,
           "segmentsProcessed": 37,
           "segmentsVectorProcessed": 37,
           "segmentRows": 266472,
           "preFilteredRows": 159739,
           "nodeRows": 1,
           "resultRows": 1,
           "cpuNanos": 5027000,
           "threads": 5,
           "queryStart": 1627010838439,
           "queryMs": 40
         },
         {
           "subQueryId": "ef46510d-668b-45e8-a9ef-9de55949342a_0",
           "segments": 37,
           "segmentsProcessed": 37,
           "segmentsVectorProcessed": 37,
           "segmentRows": 266472,
           "preFilteredRows": 194618,
           "nodeRows": 52,
           "resultRows": 5,
           "cpuNanos": 9541000,
           "threads": 5,
           "queryStart": 1627010838461,
           "queryMs": 11
         }
       ]
     }
   }
   ```
   
   ##### Results
   
   The results key contains the actual query results. In native, it is an array 
that is identical to what you would have received if you did not set the 
`X-Druid-Include-Trailer` header. In SQL it is identical to what you would have 
received if you set the result format to array.
   
   ##### Context
   
   The context key contains the response trailer. Inside the context is a 
metrics key that contains an array of query metric objects. There is one for 
the main query (which will be listed first) and one for each subquery (which 
will be listed in the order that they began execution).
   
   The specific metrics should be considered an Alpha release: they are subject 
to change and will be re-evaluated once the full query profile is available.
   
   Keys that can be present in a the current "starter" query metric object are:
   
   `subQueryId`: Present for subqueries, absent for the main query. Useful for 
differentiating subqueries from each other, and for differentiating subqueries 
from the main query.
   
   `segments`: The number of segments that were considered by data servers. 
Includes segments that were not processed because the data server ultimately 
decided to use the per-segment query cache. Does not include segments that were 
pruned by the Broker. Equivalent to the count of the query/segmentAndCache/time 
metric.
   
   `segmentsProcessed`: The number of segments that were actually processed by 
data servers. Does not include segments whose results were served out of the 
per-segment query cache, or segments that were pruned by the Broker. Equivalent 
to the count of the query/segment/time metric.
   
   `segmentsVectorProcessed`: The number of segments that were actually 
processed by data servers, and were processed using a vectorized engine. This 
is a subset of segmentsProcessed.
   
   `segmentRows`: The total number of rows across all segments processed by 
data servers. May include rows that were not actually processed because they 
were filtered out using the index. May include rows that were not actually 
processed due to pushed-down limits.
   
   `preFilteredRows`: The total number of rows across all segments processed by 
data servers, after applying index-based filters, but before applying 
cursor-based filters. Will not be larger than segmentRows. This is the 
generally the number of rows processed by the query engine, except in cases 
where the query engine does not process all rows (such as SELECT * FROM tbl 
LIMIT 5, where limit pushdown means that data servers only read the first few 
rows).
   
   `nodeRows`: The total number of rows received by the Broker from data 
servers. May be larger than segmentRows and preFilteredRows due to caching: 
nodeRows includes rows served from the per-segment query cache, but segmentRows 
and preFilteredRows do not. May also be larger than preFilteredRows due to 
queries that generate more than one output row per input row.
   
   `resultRows`: The total number of rows returned by this query after 
processing has completed on the Broker. This is how many rows a user would 
receive after issuing the query.
   
   `cpuNanos`: The total number of nanoseconds of CPU time used by this query, 
summed up across all servers. Equivalent to the sum of the query/cpu/time 
metric.
   
   `threads`: The total number of threads that participated in this query. Note 
that not all threads are necessarily active at the same time.
   
   `queryStart`: The time this query started, in milliseconds since the epoch.
   
   `queryMs`: The duration of this query, in milliseconds. This is similar to 
the query/time metric, but not quite equivalent, because it only measures the 
time until the response trailer starts being written. The query/time metric 
includes the time it takes to write the response trailer and fully flush the 
connection to the client.
   
   ### Backward Compatibility
   
   The trailer appears for client queries only if the client explicitly 
requests it via the `X-Druid-Include-Trailer` header. Since existing clients do 
not include this header, existing clients will see no functional difference.
   
   The response trailer is always enabled for internal queries between the 
broker and data nodes, etc. This change is transparent to clients. If a client 
directly queries a data node, then the above rule applies: the 
`X-Druid-Include-Trailer` header must be set.
   
   The addition of the trailer changes the format of internal query responses. 
The following ensures services can be upgraded in any order:
   
   * An old Broker won't request the trailer, so new data nodes will not return 
one.
   * A new Broker will request the trailer, but will accept the old array 
format.
   
   When running test cases, it appears that some tests may mock or otherwise 
simulate the data node response to the broker. In this case, the new Broker 
requests a header, but the mocking code returns an array format. The "will 
accept the old array format" ensures that these tests continue to run.
   
   ### Minor Revisions
   
   The PR includes a number of minor refactoring moves:
   
   #### `QueryResource`
   
   Pull the `StreamingOutput` subclass out as a named class in `QueryResource` 
so it is easier to see which parts occur within the message handler, and which 
occur asynchronously to produce the output.
   
   #### Optional `druid.server.http.minThreads` config parameter
   
   Add a `minThreads` to the Jetty server configuration to optionally reduce 
the initial number of Jetty worker threads to make the IDE experience a bit 
easier. The default is to retain the current behavior. To change the minimum 
(and startup) number of threads, add the following to the common config file:
   
   ```text
   druid.server.http.minThreads=2
   ```
   
   The thread pool will expand the thread count on demand up to `maxThreads`, 
so this change affects only the *initial* thread count.
   
   ### `SqlResource` error handling
   
   Added a distinct error code, 409 CONFLICT, for the query cancellation case 
to differentiate cancellation from internal error.
   
   ### Tests
   
   Deflaked several tests. Modified others to avoid timeout while debugging. 
Reduced copy/paste revisions for handling the new header field.
   
   <hr>
   
   This PR has:
   - [X] been self-reviewed.
      - [X] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 
   - [ ] added documentation for new or modified features or behaviors.
   - [X] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [X] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [X] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to