Also, when doubles are being summed, kahan summation can preserve more bits of doubles.
Sent from my iPhone > On Mar 14, 2014, at 18:11, "Jacques Nadeau" <[email protected]> wrote: > > > >>> On March 12, 2014, 4:36 p.m., Jason Altekruse wrote: >>> exec/java-exec/src/main/codegen/data/AggrTypes1.tdd, line 43 >>> <https://reviews.apache.org/r/18347/diff/2/?file=512023#file512023line43> >>> >>> Is it possible that we would want to make the output type of a sum of >>> BigInts (or even Ints with the dataset sizes we are expecting) a type other >>> than BigInt? If we move to something like a double, or a BigDecimal we >>> could store a more accurate guess of the sum, rather than just overflowing >>> the output value, which I assume would be reasonably common. There is the >>> problem that while doubles can store large values, they lose precision, so >>> as we calculate the sum (assuming it is going to be a large number of small >>> values) once we reach a large partial sum, adding a single value would not >>> increment the sum. We would have to store intermediate sums in some kind of >>> temporary storage until they are large enough to be added to the whole sum >>> and not be lost to rounding. > > I think moving to approximate results implicitly is dangerous. Let's keep > these as big int. If someone has an overflow problem, they could just cast > prior to summing. > > > - Jacques > > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/18347/#review36717 > ----------------------------------------------------------- > > >> On March 6, 2014, 2:10 a.m., Aman Sinha wrote: >> >> ----------------------------------------------------------- >> This is an automatically generated e-mail. To reply, visit: >> https://reviews.apache.org/r/18347/ >> ----------------------------------------------------------- >> >> (Updated March 6, 2014, 2:10 a.m.) >> >> >> Review request for drill, Jacques Nadeau, Jason Altekruse, Mehant Baid, and >> Steven Phillips. >> >> >> Bugs: Drill-318, Drill-335 and and >> https://issues.apache.org/jira/browse/Drill-318 >> https://issues.apache.org/jira/browse/Drill-335 >> https://issues.apache.org/jira/browse/and >> >> >> Repository: drill-git >> >> >> Description >> ------- >> >> This patch contains the following main components: >> 1. Implementation of the hash aggregation execution operator - this has two >> main parts: the HashAggTemplate and the HashAggBatch. >> 2. Implementation of a hash table which is used by the hash aggregation. >> The hash table hash two main parts: the HashTableTemplate and the >> ChainedHashTable. The hash table internally uses the notion of 'BatchHolder' >> to keep track of all keys that can fit within one batch of 64K values. New >> BatcHolder objects are created as needed. Each BatchHolder has its own >> vector container. The HashAggregate also has a similar structure and it >> keeps track of the workspace variables. >> (NOTE: An initial design document for the hash aggregation and hash table >> was already attached with Drill-335. The document has not yet been updated >> with the latest implementation ... will try to do that in the near future). >> 3. Jinfeng's changes to use workspace vectors in the generated code for >> aggregate functions (previously, for streaming aggregate we only needed to >> maintain workspace variable for 1 running group; however for hash aggregate >> we need to maintain it for all groups). >> 4. Fix for Drill-318: because of #3 above, the previous fix for Drill-318 is >> not valid anymore. I modified the template generation code for the >> aggregate functions such that they conform to the new infrastructure. >> 5. The original AggTemplate, AggBatch and Aggregator classes have been moved >> to corresponding StreamingAggTemplate, StreamingAggBatch and >> StreamingAggregator in order to differentiate it from hash aggregation. >> These appear as new files but the code there has not changed. >> >> >> Diffs >> ----- >> >> >> common/src/main/java/org/apache/drill/common/expression/fn/AggregationFunctions.java >> 5b46b78 >> exec/java-exec/src/main/codegen/config.fmpp 8f1060a >> exec/java-exec/src/main/codegen/data/AggrTypes1.tdd PRE-CREATION >> exec/java-exec/src/main/codegen/data/AggrTypes2.tdd PRE-CREATION >> exec/java-exec/src/main/codegen/templates/AggrTypeFunctions1.java >> PRE-CREATION >> exec/java-exec/src/main/codegen/templates/AggrTypeFunctions2.java >> PRE-CREATION >> exec/java-exec/src/main/codegen/templates/TypeHelper.java 8c72cf7 >> >> exec/java-exec/src/main/java/org/apache/drill/exec/compile/sig/GeneratorMapping.java >> f99150e >> >> exec/java-exec/src/main/java/org/apache/drill/exec/compile/sig/MappingSet.java >> 1cb2380 >> exec/java-exec/src/main/java/org/apache/drill/exec/expr/ClassGenerator.java >> e674eab >> >> exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillAggFuncHolder.java >> 7622865 >> >> exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/DrillFuncHolder.java >> dcdf1ea >> >> exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionConverter.java >> befa9bf >> >> exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/agg/impl/CountFunctions.java >> b0939f1 >> >> exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/agg/impl/SumFunctions.java >> 57905fe >> >> exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/BitFunctions.java >> 41bb7c6 >> exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java >> 8462622 >> exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java >> aed4802 >> >> exec/java-exec/src/main/java/org/apache/drill/exec/physical/OperatorCost.java >> 0bade41 >> >> exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractPhysicalVisitor.java >> ec7244a >> >> exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/PhysicalVisitor.java >> 2e2d7fd >> >> exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashAggregate.java >> PRE-CREATION >> >> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/AggBatch.java >> 86fea4e >> >> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/AggBatchCreator.java >> 36344f9 >> >> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/AggTemplate.java >> a8f39e3 >> >> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/Aggregator.java >> 2683045 >> >> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java >> PRE-CREATION >> >> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatchCreator.java >> PRE-CREATION >> >> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java >> PRE-CREATION >> >> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggregator.java >> PRE-CREATION >> >> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggBatch.java >> PRE-CREATION >> >> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggBatchCreator.java >> PRE-CREATION >> >> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggTemplate.java >> PRE-CREATION >> >> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggregator.java >> PRE-CREATION >> >> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java >> PRE-CREATION >> >> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java >> PRE-CREATION >> >> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableConfig.java >> PRE-CREATION >> >> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java >> PRE-CREATION >> >> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java >> 509d13b >> >> exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java >> 6e764a8 >> exec/java-exec/src/test/java/org/apache/drill/exec/expr/ExpressionTest.java >> 0de64b4 >> >> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/SimpleRootExec.java >> ccf7758 >> >> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestSimpleFunctions.java >> f6a8096 >> >> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/agg/TestHashAggr.java >> PRE-CREATION >> >> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/common/TestHashTable.java >> PRE-CREATION >> exec/java-exec/src/test/resources/common/test_hashtable1.json PRE-CREATION >> >> Diff: https://reviews.apache.org/r/18347/diff/ >> >> >> Testing >> ------- >> >> I have run several tests manually as part of TestHashAggr...these tests use >> TPC-H data and in particular a relatively large 'Orders' table. However, I >> have not yet packaged the tests to run as part of JUnit since the location >> and size of the parquet files needs to be figured out. I will continue to >> work on that. >> >> >> Thanks, >> >> Aman Sinha >
