candidates endpoints not respecting broker partition pruning - range partition (druid)

via GitHub Mon, 01 Apr 2024 10:59:16 -0700


ColeAtCharter opened a new issue, #16222:
URL: https://github.com/apache/druid/issues/16222


   This is to report unexpected behavior where the broker selects too many 
segments to query.  The identified scenario is when secondary partition 
information exists (eg, for range partitioning) but is not being used in 
conjunction with a query filter which corresponds to the secondary partitions.  
The result of not completely pruning segments at the broker is a detrimental 
impact on system operations such as for performance and cost.
   
   ### Affected Version
   
   28
   
   ### Description
   
   The /druid/v2 and /druid/v2/candidates broker endpoints are returning 
segments which should be filtered out based on secondary partition metadata.  
Returning unneeded segments for planning and executing queries will cause 
unnecessary I/O throughout the system, causing avoidable detriment to cost and 
performance.
   
   ##### Steps to reproduce:
   
   1. create segments with range partitioning, mark as used, load onto 
historicals.  Testing will require multiple segments per combination of 
datasource and time chunk.
   2. Identify the partition column value(s) corresponding to a single segment 
(within a datasource/time chunk)
   3. submit /v2 and /v2/candidates requests to the broker for the datasource 
and time chunk identified.  Add a query filter using a partition value 
previously identified for the first partition column.
   
   ##### Expected/observed behavior summary
   - The expected behavior is that the response will return only the segment 
identified and no others for the loaded datasource/time chunk.
   - The observed behavior is that the response will include additional 
segments for the loaded datasource/time chunk.
   
   Testing notes
   - Testing used a segment where the partition/filter column was not a start 
or end (ie, not indicated with a negative or positive infinity value for the 
column).  Further, to reduce ambiguity, a partition filter value was chosen 
that was not the first or last for the column in the segment.
   - The test setup had multiple partition columns in the queried segments but 
only filtered on the first partition column
   - Testing both included and omitted the query context value 
"secondaryPartitionPruning"
   - Some testing included "bySegment" in the query context to assess broker 
logic for selecting segments
   
   
   ### Representation of the test setup
   
   ##### segments for "src1"
   - there are multiple segments for the tested time chunk
   ```
   segment_id
   
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z
   
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_1
   
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_2
   ```
   
   ##### server segments for datasource "src1"
   - 2x replication - similar to a typical production setup, but not strictly 
required for testing
   ```
   segment_id,server
   
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z,host1:8283
   
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z,host2:8283
   
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_1,host1:8283
   
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_1,host3:8283
   
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_2,host2:8283
   
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_2,host3:8283
   ```
   
   ##### segment metadata - range partition column values
   - for testing we can assume that partitionDimensions are the first columns 
in dimensions, but that should not be strictly necessary
   ```
   
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z:
 partitionDimensions: ["col_1","col_2"], start: [-inf, -inf], end: 
["value1ghi","value2tuv"]
   
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_1:
 partitionDimensions: ["col_1","col_2"], start: ["value1ghi","value2jkl"], end: 
["value1lmn", "value2mnop"]
   
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_2:
 partitionDimensions: ["col_1","col_2"], start: ["value1lmn", "value2qrs"], 
end: [+inf, +inf]
   ```
   
   ##### test query
   - select the identified datasource/time chunk and filter on the first 
partition column using a value matching only one of the multiple segments
   ```
   {
     "queryType": "scan",
     "dataSource": "src1",
     "intervals": "2025-02-16T01:00:00.000Z/2025-02-16T02:00:00.000Z",
     "columns": ["__time", "col_1", "col_15"],
     "filter": {
       "type": "selector",
       "dimension": "col_1",
       "value": "value1jkl"
     },
     "context": {"secondaryPartitionPruning": true}
   }
   ```
   
   ##### Expected result example - /v2/candidates
   ```
   [
     {
       "interval": "2025-02-16T01:00:00.000Z/2025-02-16T02:00:00.000Z",
       "version": "2024-04-01T23:59:59.999Z",
       "partitionNumber": 1,
       "size": 999999,
       "locations": [
         {
           "name": "host1:8283",
           "host": null,
           "hostAndTlsPort": "host1:8283",
           "maxSize": 999999999999,
           "type": "historical",
           "tier": "_default_tier",
           "priority": 0
         },
         {
           "name": "host3:8283",
           "host": null,
           "hostAndTlsPort": "host3:8283",
           "maxSize": 999999999999,
           "type": "historical",
           "tier": "tier2",
           "priority": 0
         }
       ]
     }
   ]
   ```
   
   If partition pruning is reflected at the broker, it would be unexpected to 
receive back any of "the other two segments" (regardless of server/replication)
   
   ##### Unexpected result example 1: /druid/v2 endpoint - extra (but not all) 
segments are returned
   - expected only 
`src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_1`
   - (FYI) unexpected segments are loaded on different servers than the 
expected segments
   - the test adds "bySegment" to the query context
   
   ```
   [
       {
           "timestamp": "2024-04-01T09:00:00.000Z",
           "result": {
               "results": [
                   {
                       "segmentId": 
"src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_1",
                       "columns": ["__time", "col_1", "col_15"],
                       "events": [
                           {"__time": 1739667600000, "col_1": "value1jkl", 
"col_15": "hello"},
                           {"__time": 1739668980000, "col_1": "value1jkl", 
"col_15": "world"},
                           {"__time": 1739668980000, "col_1": "value1jkl", 
"col_15": "hello"},
                           {"__time": 1739670360000, "col_1": "value1jkl", 
"col_15": "world"}
                       ],
                       "rowSignature": [
                           {"name": "__time", "type": "LONG"},
                           {"name": "col_1", "type": "STRING"},
                           {"name": "col_15", "type": "STRING"}
                       ]
                   }
               ],
               "segment": 
"src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_1",
               "interval": "2025-02-16T01:00:00.000Z/2025-02-16T02:00:00.000Z"
           }
       },
       {
           "timestamp": "2024-04-01T09:00:00.000Z",
           "result": {
               "results": [],
               "segment": 
"src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z",
               "interval": "2025-02-16T01:00:00.000Z/2025-02-16T02:00:00.000Z"
           }
       }
   ]
   
   ```
   
   ##### Unexpected result example 2: /druid/v2/candidates endpoint - all 
segments are returned
   - expected only 
`src1_2025-02-16T01:00:00.000Z/2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_1`
   
   ```
   [
     {
       "interval": "2025-02-16T01:00:00.000Z/2025-02-16T02:00:00.000Z",
       "version": "2024-04-01T23:59:59.999Z",
       "partitionNumber": 0,
       "size": 999999,
       "locations": [
         {
           "name": "host1:8283",
           "host": null,
           "hostAndTlsPort": "host1:8283",
           "maxSize": 999999999999,
           "type": "historical",
           "tier": "_default_tier",
           "priority": 0
         },
         {
           "name": "host2:8283",
           "host": null,
           "hostAndTlsPort": "host2:8283",
           "maxSize": 999999999999,
           "type": "historical",
           "tier": "tier2",
           "priority": 0
         }
       ]
     },
     {
       "interval": "2025-02-16T01:00:00.000Z/2025-02-16T02:00:00.000Z",
       "version": "2024-04-01T23:59:59.999Z",
       "partitionNumber": 1,
       "size": 999999,
       "locations": [
         {
           "name": "host1:8283",
           "host": null,
           "hostAndTlsPort": "host1:8283",
           "maxSize": 999999999999,
           "type": "historical",
           "tier": "_default_tier",
           "priority": 0
         },
         {
           "name": "host3:8283",
           "host": null,
           "hostAndTlsPort": "host3:8283",
           "maxSize": 999999999999,
           "type": "historical",
           "tier": "tier2",
           "priority": 0
         }
       ]
     },
     {
       "interval": "2025-02-16T01:00:00.000Z/2025-02-16T02:00:00.000Z",
       "version": "2024-04-01T23:59:59.999Z",
       "partitionNumber": 2,
       "size": 999999,
       "locations": [
         {
           "name": "host2:8283",
           "host": null,
           "hostAndTlsPort": "host2:8283",
           "maxSize": 999999999999,
           "type": "historical",
           "tier": "_default_tier",
           "priority": 0
         },
         {
           "name": "host3:8283",
           "host": null,
           "hostAndTlsPort": "host3:8283",
           "maxSize": 999999999999,
           "type": "historical",
           "tier": "tier2",
           "priority": 0
         }
       ]
     }
   ]
   ```
   
   ##### Summary
   
   The broker should only select and/or query the smallest number of segments 
matching on datasource/time chunk and partition dimensions when it has enough 
has enough partition information about the used/loaded segments to prune out 
segments that don't match on partition dimensions (in addition to those 
segments not matching on datasource or time chunk)
   
   ##### Related documentation
   - https://druid.apache.org/docs/latest/ingestion/partitioning/
   - 
https://imply.io/blog/real-time-analytics-database-uses-partitioning-and-pruning-to-achieve-its-legendary-performance/
   - 
https://imply.io/developer/articles/multi-dimensional-range-partioning-in-druid/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[I] /v2 and /v2/candidates endpoints not respecting broker partition pruning - range partition (druid)

Reply via email to