clintropolis opened a new pull request, #14002:
URL: https://github.com/apache/druid/pull/14002

   ### Description
   While working on some other stuff I noticed that we were doing something 
when loading segments which results in some extra heap usage storing redundant 
stuff we don't really need. Specifically, the changes in #12279 causes us to 
make a combined column and dimension list in the form of `Indexed` to pass to 
the `SimpleQueryableIndex`, which it was doing prior to this PR with 
https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/IndexIO.java#L648
 `GenericIndexed.fromIterable` which is going to make new heap bytebuffers of 
the column and dimension lists. The columns `Indexed` is thrown away basically, 
but the dimensions list is kept around forever.
   
   We actually already have to have all of these column names on heap, and are 
already interning them via the `FileSmooshMapper` 
https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/java/util/common/io/smoosh/SmooshedFileMapper.java#L88
 which does so whenever it reads `meta.smoosh` to get the internal file list of 
a smoosh. There is an internal file as the entry point for every column name, 
so if we re-use this interner for the map of `ColumnHolder` suppliers as well 
as to back the `Indexed<String>` for the dimension list then we can save some 
heap.
   
   With my laptop test cluster of ~5k segments it reduces the heap footprint by 
~20MB
   
   Before:
   ![Screenshot 2023-03-29 at 8 27 59 
PM](https://user-images.githubusercontent.com/1577461/228732099-bcea909c-c400-4d42-b6db-c5ec2033af19.png)
   
   After:
   ![Screenshot 2023-03-29 at 8 28 22 
PM](https://user-images.githubusercontent.com/1577461/228732186-f4c6f7a5-1718-4edb-b3dc-28bd402166c7.png)
   
   Looking specifically at strings, we now have about 1/3 as many on heap (from 
the map of ColumnHolder suppliers), not counting all of the heap bytebuffers 
which were backing the dimensions lists.
   ![Screenshot 2023-03-29 at 8 38 54 
PM](https://user-images.githubusercontent.com/1577461/228732287-5166a7c4-3342-4cee-8240-6ba32ba51f13.png)
   ![Screenshot 2023-03-29 at 8 39 00 
PM](https://user-images.githubusercontent.com/1577461/228732298-0dfc568e-0d3f-488d-9ae9-af19de3dc033.png)
   
   
   Completely unrelated, this PR also fixes a bug with native expression type 
coercion with complex types. The added test would fail where it would return 
"unknown complex" sometimes when using the `ExpressionTypeCoercion.operator` 
method. I guess I could have made it its own PR, but both things were small...
   
   
   
   
   This PR has:
   
   - [ ] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to