Jackie-Jiang commented on code in PR #15685:
URL: https://github.com/apache/pinot/pull/15685#discussion_r2078521964
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java:
##########
@@ -117,9 +131,15 @@ private void addFlattenedRecords(List<Map<String, String>>
records) {
for (Map.Entry<String, String> entry : record.entrySet()) {
// Put both key and key-value into the posting list. Key is useful for
checking if a key exists in the json.
String key = entry.getKey();
- _postingListMap.computeIfAbsent(key, k -> new
RoaringBitmap()).add(_nextFlattenedDocId);
+ _postingListMap.computeIfAbsent(key, k -> {
+ _bytesSize += Utf8.encodedLength(key);
Review Comment:
Are we only counting the size of the keys?
If we only want to limit the heap usage, we should use the size of the
underlying byte array instead of the UTF8 format, which is the size in the
final index file.
##########
pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerMeter.java:
##########
@@ -196,7 +196,12 @@ public enum ServerMeter implements AbstractMetrics.Meter {
PREDOWNLOAD_FAILED("predownloadFailed", true),
// reingestion metrics
- SEGMENT_REINGESTION_FAILURE("segments", false);
+ SEGMENT_REINGESTION_FAILURE("segments", false),
+
+ /**
+ * Approximate heap bytes used by the mutable JSON index at the time of
index close.
+ */
+ REALTIME_JSON_INDEX_MEMORY_USAGE("bytes", true);
Review Comment:
(minor) Consider renaming to match the class name
```suggestion
MUTABLE_JSON_INDEX_MEMORY_USAGE("bytes", true);
```
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java:
##########
@@ -722,6 +742,20 @@ public String[] getValuesSV(int[] docIds, int length,
Map<String, RoaringBitmap>
@Override
public void close() {
+ try {
+ String tableName =
SegmentUtils.getTableNameFromSegmentName(_segmentName);
+ _serverMetrics.addMeteredTableValue(tableName, _columnName,
ServerMeter.REALTIME_JSON_INDEX_MEMORY_USAGE,
Review Comment:
Please double check if the metric can be properly emitted using `server.yml`
(under `jmx_prometheus_javaagent`) given you are also adding column name
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]