[
https://issues.apache.org/jira/browse/DRILL-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers resolved DRILL-5416.
--------------------------------
Resolution: Fixed
Fix Version/s: 1.12.0
> Vectors read from disk report incorrect memory sizes
> ----------------------------------------------------
>
> Key: DRILL-5416
> URL: https://issues.apache.org/jira/browse/DRILL-5416
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.8.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Minor
> Fix For: 1.12.0
>
>
> The external sort and revised hash agg operators spill to disk using a vector
> serialization mechanism. This mechanism serializes each vector as a (length,
> bytes) pair.
> Before spilling, if we check the memory used for a vector (using the new
> {{RecordBatchSizer}} class), we learn of the actual memory consumed by the
> vector, including any unused space in the vector.
> If we spill the vector, then reread it, the reported storage size is wrong.
> On reading, the code allocates a buffer, based on the saved length, rounded
> up to the next power of two. Then, when building the vector, we "slice" the
> read buffer, setting the memory size to the data size.
> For example, suppose we save 20 1-byte fields. The size on disk is 20. The
> read buffer is rounded to 32 bytes (the size of the original, pre-spill
> buffer.) We read the 20 bytes and create a vector. Creating the vector
> reports the memory size as 20, "hiding" the extra, unused 12 bytes.
> As a result, when computing memory sizes, we receive incorrect numbers.
> Working with false numbers means that the code cannot safely operate within a
> memory budget, causing the user to receive an unexpected OOM error.
> As it turns out, the code path that does the slicing is used only for reads
> from disk. This ticket asks to remove the slicing step: just use the
> allocated buffer directly so that the after-read vector reports the correct
> memory usage; same as the before-spill vector.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)