Hi, could someone point me to the recommended way of using
countApproxDistinctByKey with DataFrames?
I know I can map to pair RDD but I'm wondering if there is a simpler
method? If someone knows if this operations is expressible in SQL that
information would be most appreciated as well.
Instead of using that RDD operation just use the native DataFrame function
approxCountDistinct
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$
On Thu, Jul 16, 2015 at 6:58 AM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:
Hi, could someone point me to