[GitHub] [arrow-datafusion] JayjeetAtGithub opened a new issue, #7200: RowConverter keeps growing in size while merging streams on high-cardinality dictionary fields

via GitHub Fri, 04 Aug 2023 13:26:45 -0700


JayjeetAtGithub opened a new issue, #7200:
URL: https://github.com/apache/arrow-datafusion/issues/7200

### Describe the bug

On executing `SortPreservingMerge` on multiple streams of record batches and
using a high-cardinality dictionary field as the sort key, the `RowConverter`
instance used to merge the multiple `RowCursorStreams` keeps growing in memory
(as it keeps accumulating the dict mappings internally in the
`OrderPreservingInterner` structure). This unbounded memory growth eventually
causes data fusion to get killed by the OOM killer.

### To Reproduce

Detailed steps to reproduce this issue is given
[here](https://github.com/JayjeetAtGithub/iox_observe_bench/blob/main/docs/oom_kill.md#reproducing-the-issue).

### Expected behavior

`SortPreservingMerge` on streams of record batches with high-cardinality
dictionary-encoded sort keys should be memory aware and keep memory usage
within a user-defined limit.

### Additional context

**Possible solution:**

1. Keep track of the memory usage for the `RowConverter` using the `size()`
method which in the case of `Dictionary` fields returns the size of the
`OrderPreservingInterner`.
2. If the size of the `RowConverter` grows more than a user-defined memory
limit, take note of the `RowCursorStream` that are still getting converted,
delete the converter, create a new one, and re-do the aborted conversions.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] JayjeetAtGithub opened a new issue, #7200: RowConverter keeps growing in size while merging streams on high-cardinality dictionary fields

Reply via email to