PairRDDFunctions and DataFrames

2015-07-16 Thread Yana Kadiyska
Hi, could someone point me to the recommended way of using countApproxDistinctByKey with DataFrames? I know I can map to pair RDD but I'm wondering if there is a simpler method? If someone knows if this operations is expressible in SQL that information would be most appreciated as well.

Re: PairRDDFunctions and DataFrames

2015-07-16 Thread Michael Armbrust
Instead of using that RDD operation just use the native DataFrame function approxCountDistinct https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$ On Thu, Jul 16, 2015 at 6:58 AM, Yana Kadiyska yana.kadiy...@gmail.com wrote: Hi, could someone point me to