clintropolis opened a new pull request, #14319:
URL: https://github.com/apache/druid/pull/14319

   ### Description
   This PR adds a new interface to control how `SegmentMetadataCache` chooses 
`ColumnType` when faced with differences between segments for SQL schemas which 
are computed, exposed as `druid.sql.planner.metadataColumnTypeMergePolicy` and 
adds a new 'least restrictive type' mode to allow choosing the type that data 
across all segments can best be coerced into. The existing "newest first" 
behavior remains the default, primarily because this is a behavior change 
around when schema migrations take effect for the SQL schema. With 
`{"type":"newestFirst"}`, the SQL schema would be updated as soon as the first 
job with the new schema has published segments, while using 
`{"type":"leastRestrictive"}`, the schema would only be updated once all 
segments are reindexed to the new type. The benefit of `leastRestrictive` is 
that it eliminates a bunch of type coercion errors that can happen in SQL when 
types are varied across segments with `newestFirst` because the newest type is 
not able to correctly
  represent older data, such as if the segments have a mix of ARRAY and number 
types, or any other combinations that lead to odd query plans.
   
   I am not at all attached to these names, so if they should be called 
something else more intuitive then feel free to suggest.
   
   #### Release note
   A new broker configuration, 
`druid.sql.planner.metadataColumnTypeMergePolicy` adds configurable modes to 
how column types are computed for the SQL table schema when faced with 
differences between segments. A new 'least restrictive type' mode allows 
choosing the most appropriate type that data across all segments can best be 
coerced into. The existing "newest first" behavior remains the default, 
primarily because this is a behavior change around when schema migrations will 
take effect for the SQL schema. With `{"type":"newestFirst"}`, the SQL schema 
would be updated as soon as the first job with the new schema has published 
segments, while using `{"type":"leastRestrictive"}`, the schema would only be 
updated once all segments are reindexed to the new type. However, 
`{"type":"leastRestrictive"}` is likely to have "better" query time behavior 
and eliminates some query time errors that can occur when using `newestFirst`.
   
   <hr>
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not 
all of these items apply to every PR. Remove the items which are not done or 
not relevant to the PR. None of the items from the checklist below are strictly 
necessary, but it would be very helpful if you at least self-review the PR. -->
   
   This PR has:
   
   - [ ] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to