[ https://issues.apache.org/jira/browse/HIVE-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mayank Lahiri updated HIVE-1372: -------------------------------- Attachment: HIVE-1372.3.patch AFAIK, this is a floating point rounding error. I ran some tests on millions of large random doubles and the differences are consistently in the last few significant digits. Curiously, even the vanilla un-modified sum() UDAF produces some differences in the last few digits from R's output when operating on large-ish synthetic data, which leads me to believe that either Hive or Java's default println is pushing out a few more digits than it should, or Java's floating point handling is somehow quirky in terms of rounding. I've corrected the two .q.out files and attached the patch. > New algorithm for variance() UDAF > --------------------------------- > > Key: HIVE-1372 > URL: https://issues.apache.org/jira/browse/HIVE-1372 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor > Affects Versions: 0.6.0 > Reporter: Mayank Lahiri > Assignee: Mayank Lahiri > Priority: Minor > Fix For: 0.6.0 > > Attachments: HIVE-1372.2.patch, HIVE-1372.3.patch, HIVE-1372.patch > > > A new algorithm for the UDAF that computes variance. This is pretty much a > drop-in replacement for the current UDAF, and has two benefits: provably > numerically stable (reference included in comments), and reduces arithmetic > operations by about half. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.