clintropolis opened a new pull request, #14351:
URL: https://github.com/apache/druid/pull/14351
### Description
I think we should consider switching the `IndexSpec` default value of
`stringDictionaryEncoding` to `{"type":"frontCoded", "bucketSize":4,
"formatVersion":1}`.
Based on measurements #13854 things look pretty good and we have been
running version 0 of the format for some time on a number of datasources
without any notable performance loss, and version 1 for a smaller amount of
time. I think by the time 27 is released it should be sufficiently baked in to
feel confident about it being the default.
However, this means that upgrading from versions older than 26 will need
special consideration, so it is important to call out in the release notes if
we go forward with this.
#### Release note
Front coding was originally introduced in Druid 25.0, and an improved
'version 1' was introduced in Druid 26.0, with typically faster read speed and
smaller storage size, has become the default in Druid 27.0. This means by
default, segments created with Druid 27.0 are backwards compatible with Druid
26.0, but not compatible with Druid versions older than 26.0. If upgrading to
Druid 27.0 from a version older than 26.0, the `stringDictionaryEncoding`
should be set to `{"type": "utf8"}` to keep writing out the older format to
enable seamless downgrades to Druid 25.0 and older, and then later is
recommended to be changed to the new default once determined that rollback is
not necessary.
<hr>
This PR has:
- [x] been self-reviewed.
- [x] added documentation for new or modified features or behaviors.
- [x] a release note entry in the PR description.
- [x] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [x] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [x] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]