[GitHub] [pinot] somandal commented on a diff in pull request #9117: Emit a server side metric to identify if MV columns are present as part of the selection order by queries

GitBox Thu, 28 Jul 2022 15:30:51 -0700


somandal commented on code in PR #9117:
URL: https://github.com/apache/pinot/pull/9117#discussion_r932717509



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java:
##########
@@ -123,7 +123,12 @@ enum MetadataKey {
     NUM_SEGMENTS_PRUNED_BY_LIMIT("numSegmentsPrunedByLimit", 
MetadataValueType.INT),
     NUM_SEGMENTS_PRUNED_BY_VALUE("numSegmentsPrunedByValue", 
MetadataValueType.INT),
     
EXPLAIN_PLAN_NUM_EMPTY_FILTER_SEGMENTS("explainPlanNumEmptyFilterSegments", 
MetadataValueType.INT),
-    
EXPLAIN_PLAN_NUM_MATCH_ALL_FILTER_SEGMENTS("explainPlanNumMatchAllFilterSegments",
 MetadataValueType.INT);
+    
EXPLAIN_PLAN_NUM_MATCH_ALL_FILTER_SEGMENTS("explainPlanNumMatchAllFilterSegments",
 MetadataValueType.INT),
+    // TODO: Add a mechanism to pass generic metadata without adding them to 
the DataTable's metadata
+    // This metadata field is only required for emitting a metric to identify 
whether queries using MV columns (as
+    // identifiers or via transform) are present in the Selection Order-By 
queries. This was the only place where
+    // such a metadata could be added in the existing interfaces even though 
we don't need this on the Broker side.
+    QUERY_HAS_MV_SELECTION_ORDER_BY("queryHasMVSelectionOrderBy", 
MetadataValueType.STRING);

Review Comment:
   hey @Jackie-Jiang, @siddharthteotia and I discussed the `ServerMetrics` 
change and do think it is a good idea in general. One problem is that we need 
an additional way to correlate a query with this metric, and for that purpose 
we actually extended a log in `QueryScheduler` in this existing PR to capture 
the value of this flag at the query level associated at the request ID. This is 
especially important for high QPS use cases where we may need to comb through 
hundreds of logs on the Broker side to find the possible offending queries. In 
addition to that, it may not be desirable to emit a metric for every segment 
(since the operators are at a segment level) and instead just emit once per 
query.
   
   Thinking some more we have a couple of ideas to address this (have both a 
log and a metric and avoid doing this for every segment belonging to the query):
   
   Option 1 (preferred method):
   
   - Modify the `QueryContext` to also contain the request ID from the 
`ServerQueryRequest`. This will then be available throughout all the plan nodes 
and operators. We can then use this request ID for logging purposes temporarily.
   - To avoid emitting a metric and log in the `SelectionOrderByOperator` but 
instead do this at the `InstanceResponseOperator` level where we will have the 
combined `IntermediateResultsBlock` available. This will allow us to emit the 
metric and log only once per query per server. The `SelectionOrderByOperator` 
will still detect this scenario and set it in the `IntermediateResultsBlock` as 
is being done in the current PR.
   - For now we can avoid modifying all the operators and plan nodes to take 
the `ServerMetrics` as input and instead only pass it down up to the 
`InstanceResponsePlanNode` and `InstanceResponseOperator`. In the future it 
should be trivial to extend all operators and plan nodes to have the 
`ServerMetrics` if the need arises.
   
   Option 2
   
   - Add a transient metadata object to the `IntermediateResultsBlock` and 
potentially `DataTable` where we can store all the transient metadata fields 
which aren't serializable and will be ignored when creating the serialized 
`DataTable`. The drawback with this approach is that certain operators like 
Filter don't use `IntermediateResultsBlock` directly and we'll need to add 
getters/setters to all the intermediate block types and copy these over. We 
still only want to emit the metric and log just once per query.
   
   
   Let us know your thoughts @Jackie-Jiang



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] somandal commented on a diff in pull request #9117: Emit a server side metric to identify if MV columns are present as part of the selection order by queries

Reply via email to