Re: incomplete aggregation in a GROUP BY

2016-11-03 Thread Michael Armbrust
Sounds like a bug, if you can reproduce on 1.6.3 (currently being voted on), then please open a JIRA. On Thu, Nov 3, 2016 at 8:05 AM, Donald Matthews wrote: > While upgrading a program from Spark 1.5.2 to Spark 1.6.2, I've run into a > HiveContext GROUP BY that no longer

incomplete aggregation in a GROUP BY

2016-11-03 Thread Donald Matthews
While upgrading a program from Spark 1.5.2 to Spark 1.6.2, I've run into a HiveContext GROUP BY that no longer works reliably. The GROUP BY results are not always fully aggregated; instead, I get lots of duplicate + triplicate sets of group values. I've come up with a workaround that works for