kfaraz opened a new issue #12022:
URL: https://github.com/apache/druid/issues/12022
### Motivation
The existing implementation in `OnHeapIncrementalIndex` tends to
over-estimate memory usage
thus leading to more persistence cycles than necessary during ingestion. A
more accurate estimation
of mem usage would also free it up for other purposes.
The recent changes in #11950 have made improvements by adding a
`guessAggregatorHeapFootprint()` method to the `Aggregator` interface. This
proposal also addresses the same problem but advocates replacing the guess
mechanism with getting the actual incremental memory used by an aggregator.
### Proposed changes
- Update the method `Aggregator.aggregate()` to return a `long` instead of
`void`. The returned long would represent the incremental memory in bytes used
by the aggregator in that particular invocation of the method.
- Remove the method `DimensionIndexer.estimateEncodedKeyComponentSize()`
- Update the method `DimensionIndexer.getUnsortedEncodedValueFromSorted()`
to return objects of a new generic class `EncodedDimensionValue<EncodedType>`
which contains:
- `EncodedType value`: e.g. `int[]` for `StringDimIndexer`, `Long` for
`LongDimIndexer`
- `long incrementalSize`: The delta in size required for the `value`. For
numerical values, e.g. in `LongDimensionIndexer`, it would just be the size of
the datatype. But for `StringDimensionIndexer`, which returns an encoded `int
[]`, it would represent the size of the array and also any _new_ dimension
value that has been encoded into the dictionary in this invocation. Simply put,
the `getUnsortedEncodedValueFromSorted()` now returns a payload and also the
memory required for that payload.
### Rationale
Estimation of a row size
`row size = aggregator size + dims key size + overhead`
__Aggregator Size__
Currently, we compute the _max_ size for the aggregator and use it for every
row, which overestimates the actual memory usage. With the proposed change, we
would be using the actual footprint for each row and not the max thus getting
more accurate estimates.
__Dims Key Size__
The estimation of the dimension key size already takes into account the
current row and not the max size. But here, the overestimates are caused by
repeatedly adding the footprint of the same String values (especially in the
case of multi-valued dimensions) which are in fact stored _only once_ in the
dictionary and only the integers codes are present in every row. With the
proposed change, we add the footprint of a String value only once when it is
being newly added to the dictionary.
### Backwards Compatibility
Neither of the changes mentioned above would be backwards compatible.
__Aggregator Size__
Any class implementing the `Aggregator` interface would need to be fixed to
return a `long` instead of `void`.
Workaround (Rejected):
To retain compatibilty, we could add a new `default long
aggregateAndEstimateMemory()` method and leave the existing `aggregate()`
method as is. The default implementation would return a negative value (say -1)
in which case the `OnHeapIncrementalIndex` or any other estimator would use the
max estimated size for that invocation.
But this approach would be pretty hacky and problematic in the long run as
callers of the `Aggregator` would be free to call either of the two aggregate
methods thus producing widely different and erroneous memory estimations.
__Dims Key Size__
Any class implementing the `DimensionIndexer` would have to be fixed. (This
change is less of a compatibility concern as compared to the `Aggregator`
change as the `DimensionIndexer` has fewer implementations)
Workaround (Rejected):
To retain compatibility, we could retain the
`DimensionIndexer.estimateEncodedKeyComponentSize()` which would account for a
String value only when it is newly encountered. But this would require having
two dictionaries, the first used by the method
`getUnsortedEncodedValueFromSorted()` to track encoded String values encoding
and the second used by `estimateEncodedKeyComponentSize()` to track estimated
String values.
This workaround would introduce unnecessary complexity and overhead of
maintaining two dictionaries.
### Operational impact
- All aggregator implementations in extensions would have to be updated.
- Rolling Upgrade: Not affected
### Future Work (optional)
The memory estimate values returned by the updated `aggregate()` method
could be used by other callers (such as `TimeseriesQueryEngine`) to estimate
memory if required.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]