Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch
size for hash aggregate
URL: https://github.com/apache/drill/pull/1324#discussion_r198697870
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
##########
@@ -84,6 +97,63 @@
"htRowIdx" /* workspace index */, "incoming" /* read container */,
"outgoing" /* write container */,
"aggrValuesContainer" /* workspace container */, UPDATE_AGGR_INSIDE,
UPDATE_AGGR_OUTSIDE, UPDATE_AGGR_INSIDE);
+ public int getOutputRowCount() {
+ return hashAggMemoryManager.getOutputRowCount();
+ }
+
+ public RecordBatchMemoryManager getRecordBatchMemoryManager() {
+ return hashAggMemoryManager;
+ }
+
+ private class HashAggMemoryManager extends RecordBatchMemoryManager {
+ private int valuesRowWidth = 0;
+
+ HashAggMemoryManager(int outputBatchSize) {
+ super(outputBatchSize);
+ }
+
+ @Override
+ public void update() {
+ // Get sizing information for the batch.
+ setRecordBatchSizer(new RecordBatchSizer(incoming));
+
+ int fieldId = 0;
+ int newOutgoingRowWidth = 0;
+ for (VectorWrapper<?> w : container) {
+ if (w.getValueVector() instanceof FixedWidthVector) {
+ newOutgoingRowWidth += ((FixedWidthVector)
w.getValueVector()).getValueWidth();
+ if (fieldId >= numGroupByExprs) {
+ valuesRowWidth += ((FixedWidthVector)
w.getValueVector()).getValueWidth();
+ }
+ } else {
+ RecordBatchSizer.ColumnSize columnSize =
getRecordBatchSizer().getColumn(columnMapping.get(w.getValueVector().getField().getName()));
+ newOutgoingRowWidth += columnSize.getAllocSizePerEntry();
+ if (fieldId >= numGroupByExprs) {
+ valuesRowWidth += columnSize.getAllocSizePerEntry();
+ }
+ }
+ fieldId++;
+ }
+
+ updateIncomingStats();
+ if (logger.isDebugEnabled()) {
+ logger.debug("BATCH_STATS, incoming: {}", getRecordBatchSizer());
+ }
+
+ // We do not want to keep adjusting batch holders target row count
Review comment:
The code below is correct, however suggestion for elegance: All the code
below is using a single local parameter (*newOutgoingRowWidth*) , and all the
rest is multiple calls to methods of the **RecordBatchMemoryManager** class.
So -- how about moving all the code (from here to the end of update() )
into a new method in the **RecordBatchMemoryManager** class, called possibly
`updateMemoryManagerIfNeeded(newOutgoingRowWidth)`
This looks simpler and cleaner.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services