-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18347/
-----------------------------------------------------------

(Updated March 6, 2014, 2:10 a.m.)


Review request for drill, Jacques Nadeau, Jason Altekruse, Mehant Baid, and 
Steven Phillips.


Changes
-------

This second patch is a cumulative patch that includes the previous patch plus 
the following changes/enhancements:

1. It includes support for the code generation of the probe side of a hash join 
(previously, we only needed the 'build' side for the hash aggregate)
2. Fix for a index out of bound error that I encountered while doing testing - 
this was related to using the wrong TypedFieldId when accessing the hash 
table's container. This is resolved by doing the creation of htContainer as 
part of ChainedHashTable and making a clone of it in HashTableTemplate. 
3. Address a review comment: Allocate the links, hashValues and startIndices 
using our vector allocators (off-heap) instead of on the heap.   


Bugs: Drill-318, Drill-335 and and
    https://issues.apache.org/jira/browse/Drill-318
    https://issues.apache.org/jira/browse/Drill-335
    https://issues.apache.org/jira/browse/and


Repository: drill-git


Description
-------

This patch contains the following main components: 
1. Implementation of the hash aggregation execution operator - this has two 
main parts: the HashAggTemplate and the HashAggBatch.  
2. Implementation of a hash table which is used by the hash aggregation.  The 
hash table hash two main parts: the HashTableTemplate and the ChainedHashTable. 
The hash table internally uses the notion of 'BatchHolder' to keep track of all 
keys that can fit within one batch of 64K values.  New BatcHolder objects are 
created as needed.  Each BatchHolder has its own vector container.  The 
HashAggregate also has a similar structure and it keeps track of the workspace 
variables.  
 (NOTE: An initial design document for the hash aggregation and hash table was 
already attached with Drill-335.  The document has not yet been updated with 
the latest implementation ... will try to do that in the near future).
3. Jinfeng's changes to use workspace vectors in the generated code for 
aggregate functions (previously, for streaming aggregate we only needed to 
maintain workspace variable for 1 running group; however for hash aggregate we 
need to maintain it for all groups).  
4. Fix for Drill-318: because of #3 above, the previous fix for Drill-318 is 
not valid anymore.  I modified the template generation code for the aggregate 
functions such that they conform to the new infrastructure.  
5. The original AggTemplate, AggBatch and Aggregator classes have been moved to 
corresponding StreamingAggTemplate, StreamingAggBatch and StreamingAggregator 
in order to differentiate it from hash aggregation.  These appear as new files 
but the code there has not changed.  


Diffs (updated)
-----

  
common/src/main/java/org/apache/drill/common/expression/fn/AggregationFunctions.java
 5b46b78 
  exec/java-exec/src/main/codegen/config.fmpp 8f1060a 
  exec/java-exec/src/main/codegen/data/AggrTypes1.tdd PRE-CREATION 
  exec/java-exec/src/main/codegen/data/AggrTypes2.tdd PRE-CREATION 
  exec/java-exec/src/main/codegen/templates/AggrTypeFunctions1.java 
PRE-CREATION 
  exec/java-exec/src/main/codegen/templates/AggrTypeFunctions2.java 
PRE-CREATION 
  exec/java-exec/src/main/codegen/templates/TypeHelper.java 8c72cf7 
  
exec/java-exec/src/main/java/org/apache/drill/exec/compile/sig/GeneratorMapping.java
 f99150e 
  
exec/java-exec/src/main/java/org/apache/drill/exec/compile/sig/MappingSet.java 
1cb2380 
  exec/java-exec/src/main/java/org/apache/drill/exec/expr/ClassGenerator.java 
e674eab 
  
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillAggFuncHolder.java
 7622865 
  
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillFuncHolder.java 
dcdf1ea 
  
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionConverter.java
 befa9bf 
  
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/agg/impl/CountFunctions.java
 b0939f1 
  
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/agg/impl/SumFunctions.java
 57905fe 
  
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/BitFunctions.java
 41bb7c6 
  exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java 
8462622 
  exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
aed4802 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/OperatorCost.java 
0bade41 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractPhysicalVisitor.java
 ec7244a 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/PhysicalVisitor.java
 2e2d7fd 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashAggregate.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/AggBatch.java
 86fea4e 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/AggBatchCreator.java
 36344f9 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/AggTemplate.java
 a8f39e3 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/Aggregator.java
 2683045 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatchCreator.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggregator.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggBatch.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggBatchCreator.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggTemplate.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggregator.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableConfig.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java
 509d13b 
  
exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java 
6e764a8 
  exec/java-exec/src/test/java/org/apache/drill/exec/expr/ExpressionTest.java 
0de64b4 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/SimpleRootExec.java
 ccf7758 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestSimpleFunctions.java
 f6a8096 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/agg/TestHashAggr.java
 PRE-CREATION 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/common/TestHashTable.java
 PRE-CREATION 
  exec/java-exec/src/test/resources/common/test_hashtable1.json PRE-CREATION 

Diff: https://reviews.apache.org/r/18347/diff/


Testing
-------

I have run several tests manually as part of TestHashAggr...these tests use 
TPC-H data and in particular a relatively large 'Orders' table.  However, I 
have not yet packaged the tests to run as part of JUnit since the location and 
size of the parquet files needs to be figured out. I will continue to work on 
that.  


Thanks,

Aman Sinha

Reply via email to