[
https://issues.apache.org/jira/browse/DRILL-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093509#comment-16093509
]
ASF GitHub Bot commented on DRILL-5601:
---------------------------------------
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/860#discussion_r128127882
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/RecordBatchSizer.java
---
@@ -189,30 +238,29 @@ public RecordBatchSizer(VectorAccessible va) {
public RecordBatchSizer(VectorAccessible va, SelectionVector2 sv2) {
rowCount = va.getRecordCount();
for (VectorWrapper<?> vw : va) {
- int size = measureColumn(vw.getValueVector());
- if ( size > maxSize ) { maxSize = size; }
- if ( vw.getField().isNullable() ) { numNullables++; }
+ measureColumn(vw.getValueVector(), "", rowCount);
+ }
+
+ for (BufferLedger ledger : ledgers) {
+ accountedMemorySize += ledger.getAccountedSize();
}
if (rowCount > 0) {
- grossRowWidth = roundUp(totalBatchSize, rowCount);
+ grossRowWidth = roundUp(accountedMemorySize, rowCount);
}
if (sv2 != null) {
sv2Size = sv2.getBuffer(false).capacity();
- grossRowWidth += roundUp(sv2Size, rowCount);
- netRowWidth += 2;
+ accountedMemorySize += sv2Size;
}
- int totalDensity = 0;
- int usableCount = 0;
- for (ColumnSize colSize : columnSizes) {
- if ( colSize.density > 0 ) {
- usableCount++;
- }
- totalDensity += colSize.density;
- }
- avgDensity = roundUp(totalDensity, usableCount);
+ computeEstimates();
+ }
+
+ private void computeEstimates() {
+ grossRowWidth = roundUp(accountedMemorySize, rowCount);
+ netRowWidth = roundUp(netBatchSize, rowCount);
+ avgDensity = roundUp(netBatchSize * 100, accountedMemorySize);
}
public void applySv2() {
--- End diff --
Fixed.
> Rollup of External Sort memory management fixes
> -----------------------------------------------
>
> Key: DRILL-5601
> URL: https://issues.apache.org/jira/browse/DRILL-5601
> Project: Apache Drill
> Issue Type: Task
> Affects Versions: 1.11.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: 1.12.0
>
>
> Rollup of a set of specific JIRA entries that all relate to the very
> difficult problem of managing memory within Drill in order for the external
> sort to stay within a memory budget. In general, the fixes relate to better
> estimating memory used by the three ways that Drill allocates vector memory
> (see DRILL-5522) and to predicting the size of vectors that the sort will
> create, to avoid repeated realloc-copy cycles (see DRILL-5594).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)