[PR] decouple column serializer compression closers from SegmentWriteOutMedium (druid)

via GitHub Thu, 07 Mar 2024 16:09:42 -0800


clintropolis opened a new pull request, #16076:
URL: https://github.com/apache/druid/pull/16076


   ### Description
   This PR modifies column serializers that use compression to split out the 
`Closer` so that we don't have to use the one from 
`SegmentWriteoutMedium.getCloser()`, allowing serializers to optionally release 
the direct memory that most compression strategies 
allocateSegmentWriteOutMediummuch earlier than when the segment serialization 
is completed.
   
   Most callers still use `SegmentWriteOutMedium.getCloser()` because it 
doesn't matter too much, but this change allows nested column serialization to 
use a separate closer associated with an individual 
`GlobalDictionaryEncodedFieldColumnWriter` (one of these for each field) to 
release as soon as the nested field is written instead of holding on to them 
until the entire segment is finished. The main situation this will impact is 
when there are a very large number of nested fields, as the relatively small 
amount of direct memory used by each individual buffer (usually 64kb or so) 
could quickly add up when there are thousands of paths.
   
   I wasn't able to add any direct tests to confirm the buffers are freed prior 
to the `SegmentWriteOutMedium` itself being closed because all of this stuff is 
tucked pretty far away and would have to have some gross interface changes to 
push in an observable closer, but, did at least confirm in the debugger that 
this is in fact happening. (The same is true of the 'temp' writeout medium 
these nested columns use to also release the temp files that are no longer 
needed after the field is serialized, this new closer is closed in the same 
place). This code path is hit during tests, so existing passing tests should 
indicate that this change had no negative effect on serialization.
   
   #### Release note
   Nested column serialization now releases nested field compression buffers as 
soon as the nested field serialization is completed, which should require 
significantly less direct memory during segment serialization when many nested 
fields are present.
   
   <hr>
   
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not 
all of these items apply to every PR. Remove the items which are not done or 
not relevant to the PR. None of the items from the checklist below are strictly 
necessary, but it would be very helpful if you at least self-review the PR. -->
   
   This PR has:
   
   - [ ] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] decouple column serializer compression closers from SegmentWriteOutMedium (druid)

Reply via email to