Also, when doubles are being summed, kahan summation can preserve more bits of 
doubles. 

Sent from my iPhone

> On Mar 14, 2014, at 18:11, "Jacques Nadeau" <[email protected]> wrote:
> 
> 
> 
>>> On March 12, 2014, 4:36 p.m., Jason Altekruse wrote:
>>> exec/java-exec/src/main/codegen/data/AggrTypes1.tdd, line 43
>>> <https://reviews.apache.org/r/18347/diff/2/?file=512023#file512023line43>
>>> 
>>>    Is it possible that we would want to make the output type of a sum of 
>>> BigInts (or even Ints with the dataset sizes we are expecting) a type other 
>>> than BigInt? If we move to something like a double, or a BigDecimal we 
>>> could store a more accurate guess of the sum, rather than just overflowing 
>>> the output value, which I assume would be reasonably common. There is the 
>>> problem that while doubles can store large values, they lose precision, so 
>>> as we calculate the sum (assuming it is going to be a large number of small 
>>> values) once we reach a large partial sum, adding a single value would not 
>>> increment the sum. We would have to store intermediate sums in some kind of 
>>> temporary storage until they are large enough to be added to the whole sum 
>>> and not be lost to rounding.
> 
> I think moving to approximate results implicitly is dangerous.  Let's keep 
> these as big int.  If someone has an overflow problem, they could just cast 
> prior to summing.
> 
> 
> - Jacques
> 
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18347/#review36717
> -----------------------------------------------------------
> 
> 
>> On March 6, 2014, 2:10 a.m., Aman Sinha wrote:
>> 
>> -----------------------------------------------------------
>> This is an automatically generated e-mail. To reply, visit:
>> https://reviews.apache.org/r/18347/
>> -----------------------------------------------------------
>> 
>> (Updated March 6, 2014, 2:10 a.m.)
>> 
>> 
>> Review request for drill, Jacques Nadeau, Jason Altekruse, Mehant Baid, and 
>> Steven Phillips.
>> 
>> 
>> Bugs: Drill-318, Drill-335 and and
>>    https://issues.apache.org/jira/browse/Drill-318
>>    https://issues.apache.org/jira/browse/Drill-335
>>    https://issues.apache.org/jira/browse/and
>> 
>> 
>> Repository: drill-git
>> 
>> 
>> Description
>> -------
>> 
>> This patch contains the following main components: 
>> 1. Implementation of the hash aggregation execution operator - this has two 
>> main parts: the HashAggTemplate and the HashAggBatch.  
>> 2. Implementation of a hash table which is used by the hash aggregation.  
>> The hash table hash two main parts: the HashTableTemplate and the 
>> ChainedHashTable. The hash table internally uses the notion of 'BatchHolder' 
>> to keep track of all keys that can fit within one batch of 64K values.  New 
>> BatcHolder objects are created as needed.  Each BatchHolder has its own 
>> vector container.  The HashAggregate also has a similar structure and it 
>> keeps track of the workspace variables.  
>> (NOTE: An initial design document for the hash aggregation and hash table 
>> was already attached with Drill-335.  The document has not yet been updated 
>> with the latest implementation ... will try to do that in the near future).
>> 3. Jinfeng's changes to use workspace vectors in the generated code for 
>> aggregate functions (previously, for streaming aggregate we only needed to 
>> maintain workspace variable for 1 running group; however for hash aggregate 
>> we need to maintain it for all groups).  
>> 4. Fix for Drill-318: because of #3 above, the previous fix for Drill-318 is 
>> not valid anymore.  I modified the template generation code for the 
>> aggregate functions such that they conform to the new infrastructure.  
>> 5. The original AggTemplate, AggBatch and Aggregator classes have been moved 
>> to corresponding StreamingAggTemplate, StreamingAggBatch and 
>> StreamingAggregator in order to differentiate it from hash aggregation.  
>> These appear as new files but the code there has not changed.  
>> 
>> 
>> Diffs
>> -----
>> 
>>  
>> common/src/main/java/org/apache/drill/common/expression/fn/AggregationFunctions.java
>>  5b46b78 
>>  exec/java-exec/src/main/codegen/config.fmpp 8f1060a 
>>  exec/java-exec/src/main/codegen/data/AggrTypes1.tdd PRE-CREATION 
>>  exec/java-exec/src/main/codegen/data/AggrTypes2.tdd PRE-CREATION 
>>  exec/java-exec/src/main/codegen/templates/AggrTypeFunctions1.java 
>> PRE-CREATION 
>>  exec/java-exec/src/main/codegen/templates/AggrTypeFunctions2.java 
>> PRE-CREATION 
>>  exec/java-exec/src/main/codegen/templates/TypeHelper.java 8c72cf7 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/compile/sig/GeneratorMapping.java
>>  f99150e 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/compile/sig/MappingSet.java
>>  1cb2380 
>>  exec/java-exec/src/main/java/org/apache/drill/exec/expr/ClassGenerator.java 
>> e674eab 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillAggFuncHolder.java
>>  7622865 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillFuncHolder.java
>>  dcdf1ea 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionConverter.java
>>  befa9bf 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/agg/impl/CountFunctions.java
>>  b0939f1 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/agg/impl/SumFunctions.java
>>  57905fe 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/BitFunctions.java
>>  41bb7c6 
>>  exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java 
>> 8462622 
>>  exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
>> aed4802 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/physical/OperatorCost.java
>>  0bade41 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractPhysicalVisitor.java
>>  ec7244a 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/PhysicalVisitor.java
>>  2e2d7fd 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashAggregate.java
>>  PRE-CREATION 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/AggBatch.java
>>  86fea4e 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/AggBatchCreator.java
>>  36344f9 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/AggTemplate.java
>>  a8f39e3 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/Aggregator.java
>>  2683045 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
>>  PRE-CREATION 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatchCreator.java
>>  PRE-CREATION 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
>>  PRE-CREATION 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggregator.java
>>  PRE-CREATION 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggBatch.java
>>  PRE-CREATION 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggBatchCreator.java
>>  PRE-CREATION 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggTemplate.java
>>  PRE-CREATION 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggregator.java
>>  PRE-CREATION 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
>>  PRE-CREATION 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java
>>  PRE-CREATION 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableConfig.java
>>  PRE-CREATION 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java
>>  PRE-CREATION 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java
>>  509d13b 
>>  
>> exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java
>>  6e764a8 
>>  exec/java-exec/src/test/java/org/apache/drill/exec/expr/ExpressionTest.java 
>> 0de64b4 
>>  
>> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/SimpleRootExec.java
>>  ccf7758 
>>  
>> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestSimpleFunctions.java
>>  f6a8096 
>>  
>> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/agg/TestHashAggr.java
>>  PRE-CREATION 
>>  
>> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/common/TestHashTable.java
>>  PRE-CREATION 
>>  exec/java-exec/src/test/resources/common/test_hashtable1.json PRE-CREATION 
>> 
>> Diff: https://reviews.apache.org/r/18347/diff/
>> 
>> 
>> Testing
>> -------
>> 
>> I have run several tests manually as part of TestHashAggr...these tests use 
>> TPC-H data and in particular a relatively large 'Orders' table.  However, I 
>> have not yet packaged the tests to run as part of JUnit since the location 
>> and size of the parquet files needs to be figured out. I will continue to 
>> work on that.  
>> 
>> 
>> Thanks,
>> 
>> Aman Sinha
> 

Reply via email to