clintropolis opened a new pull request, #16076:
URL: https://github.com/apache/druid/pull/16076
### Description
This PR modifies column serializers that use compression to split out the
`Closer` so that we don't have to use the one from
`SegmentWriteoutMedium.getCloser()`, allowing serializers to optionally release
the direct memory that most compression strategies
allocateSegmentWriteOutMediummuch earlier than when the segment serialization
is completed.
Most callers still use `SegmentWriteOutMedium.getCloser()` because it
doesn't matter too much, but this change allows nested column serialization to
use a separate closer associated with an individual
`GlobalDictionaryEncodedFieldColumnWriter` (one of these for each field) to
release as soon as the nested field is written instead of holding on to them
until the entire segment is finished. The main situation this will impact is
when there are a very large number of nested fields, as the relatively small
amount of direct memory used by each individual buffer (usually 64kb or so)
could quickly add up when there are thousands of paths.
I wasn't able to add any direct tests to confirm the buffers are freed prior
to the `SegmentWriteOutMedium` itself being closed because all of this stuff is
tucked pretty far away and would have to have some gross interface changes to
push in an observable closer, but, did at least confirm in the debugger that
this is in fact happening. (The same is true of the 'temp' writeout medium
these nested columns use to also release the temp files that are no longer
needed after the field is serialized, this new closer is closed in the same
place). This code path is hit during tests, so existing passing tests should
indicate that this change had no negative effect on serialization.
#### Release note
Nested column serialization now releases nested field compression buffers as
soon as the nested field serialization is completed, which should require
significantly less direct memory during segment serialization when many nested
fields are present.
<hr>
<!-- Check the items by putting "x" in the brackets for the done things. Not
all of these items apply to every PR. Remove the items which are not done or
not relevant to the PR. None of the items from the checklist below are strictly
necessary, but it would be very helpful if you at least self-review the PR. -->
This PR has:
- [ ] been self-reviewed.
- [ ] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [ ] a release note entry in the PR description.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [ ] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]