klsince commented on PR #12744:
URL: https://github.com/apache/pinot/pull/12744#issuecomment-2218852472
hi @itschrispeck is there a feature flag to disable this optimization?
We found the heap usage got higher than before after upgrading to recent
code. From server logs, we found "Reusing the realtime lucene index for
segment" which led me here. And from heap dump, we found
`lucene.index.SegmentCommitInfo` was a top consumer of heap space, and they
were mainly referenced by `RealtimeLuceneTextIndex`.
```
num #instances #bytes class name (module)
-------------------------------------------------------
1: 5481198 686979272 [B ([email protected])
2: 9944219 477322512 java.util.HashMap ([email protected])
3: 4496967 467684568 org.apache.lucene.index.SegmentCommitInfo
4: 8873625 283956000 java.util.HashMap$Node ([email protected])
...
```
We didn't config TextIndex for those tables across the upgrades, so I'd
assume this optimization was enabled by default. So I'd like to check if there
is a feature flag to disable this, so we can validate if this had caused the
higher heap usage.
If there is no such feature flag, I can try to add one for your review. I'm
thinking to add one around here
```
// Optimization for realtime segment conversion
if (dataSource instanceof RealtimeSegmentSegmentCreationDataSource) {
<--- looks like hard coded to enable this optimization when committing mutable
segment
_config.setRealtimeConversion(true);
_config.setConsumerDir(((RealtimeSegmentSegmentCreationDataSource)
dataSource).getConsumerDir());
}
```
cc @Jackie-Jiang
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]