-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18347/
-----------------------------------------------------------
(Updated March 6, 2014, 2:10 a.m.)
Review request for drill, Jacques Nadeau, Jason Altekruse, Mehant Baid, and
Steven Phillips.
Changes
-------
This second patch is a cumulative patch that includes the previous patch plus
the following changes/enhancements:
1. It includes support for the code generation of the probe side of a hash join
(previously, we only needed the 'build' side for the hash aggregate)
2. Fix for a index out of bound error that I encountered while doing testing -
this was related to using the wrong TypedFieldId when accessing the hash
table's container. This is resolved by doing the creation of htContainer as
part of ChainedHashTable and making a clone of it in HashTableTemplate.
3. Address a review comment: Allocate the links, hashValues and startIndices
using our vector allocators (off-heap) instead of on the heap.
Bugs: Drill-318, Drill-335 and and
https://issues.apache.org/jira/browse/Drill-318
https://issues.apache.org/jira/browse/Drill-335
https://issues.apache.org/jira/browse/and
Repository: drill-git
Description
-------
This patch contains the following main components:
1. Implementation of the hash aggregation execution operator - this has two
main parts: the HashAggTemplate and the HashAggBatch.
2. Implementation of a hash table which is used by the hash aggregation. The
hash table hash two main parts: the HashTableTemplate and the ChainedHashTable.
The hash table internally uses the notion of 'BatchHolder' to keep track of all
keys that can fit within one batch of 64K values. New BatcHolder objects are
created as needed. Each BatchHolder has its own vector container. The
HashAggregate also has a similar structure and it keeps track of the workspace
variables.
(NOTE: An initial design document for the hash aggregation and hash table was
already attached with Drill-335. The document has not yet been updated with
the latest implementation ... will try to do that in the near future).
3. Jinfeng's changes to use workspace vectors in the generated code for
aggregate functions (previously, for streaming aggregate we only needed to
maintain workspace variable for 1 running group; however for hash aggregate we
need to maintain it for all groups).
4. Fix for Drill-318: because of #3 above, the previous fix for Drill-318 is
not valid anymore. I modified the template generation code for the aggregate
functions such that they conform to the new infrastructure.
5. The original AggTemplate, AggBatch and Aggregator classes have been moved to
corresponding StreamingAggTemplate, StreamingAggBatch and StreamingAggregator
in order to differentiate it from hash aggregation. These appear as new files
but the code there has not changed.
Diffs (updated)
-----
common/src/main/java/org/apache/drill/common/expression/fn/AggregationFunctions.java
5b46b78
exec/java-exec/src/main/codegen/config.fmpp 8f1060a
exec/java-exec/src/main/codegen/data/AggrTypes1.tdd PRE-CREATION
exec/java-exec/src/main/codegen/data/AggrTypes2.tdd PRE-CREATION
exec/java-exec/src/main/codegen/templates/AggrTypeFunctions1.java
PRE-CREATION
exec/java-exec/src/main/codegen/templates/AggrTypeFunctions2.java
PRE-CREATION
exec/java-exec/src/main/codegen/templates/TypeHelper.java 8c72cf7
exec/java-exec/src/main/java/org/apache/drill/exec/compile/sig/GeneratorMapping.java
f99150e
exec/java-exec/src/main/java/org/apache/drill/exec/compile/sig/MappingSet.java
1cb2380
exec/java-exec/src/main/java/org/apache/drill/exec/expr/ClassGenerator.java
e674eab
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillAggFuncHolder.java
7622865
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillFuncHolder.java
dcdf1ea
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionConverter.java
befa9bf
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/agg/impl/CountFunctions.java
b0939f1
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/agg/impl/SumFunctions.java
57905fe
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/BitFunctions.java
41bb7c6
exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java
8462622
exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java
aed4802
exec/java-exec/src/main/java/org/apache/drill/exec/physical/OperatorCost.java
0bade41
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractPhysicalVisitor.java
ec7244a
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/PhysicalVisitor.java
2e2d7fd
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashAggregate.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/AggBatch.java
86fea4e
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/AggBatchCreator.java
36344f9
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/AggTemplate.java
a8f39e3
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/Aggregator.java
2683045
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatchCreator.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggregator.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggBatch.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggBatchCreator.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggTemplate.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggregator.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableConfig.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java
PRE-CREATION
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java
509d13b
exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java
6e764a8
exec/java-exec/src/test/java/org/apache/drill/exec/expr/ExpressionTest.java
0de64b4
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/SimpleRootExec.java
ccf7758
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestSimpleFunctions.java
f6a8096
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/agg/TestHashAggr.java
PRE-CREATION
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/common/TestHashTable.java
PRE-CREATION
exec/java-exec/src/test/resources/common/test_hashtable1.json PRE-CREATION
Diff: https://reviews.apache.org/r/18347/diff/
Testing
-------
I have run several tests manually as part of TestHashAggr...these tests use
TPC-H data and in particular a relatively large 'Orders' table. However, I
have not yet packaged the tests to run as part of JUnit since the location and
size of the parquet files needs to be figured out. I will continue to work on
that.
Thanks,
Aman Sinha