Ted Malaska created SPARK-9237:
----------------------------------
Summary: Added Top N Column Values for DataFrames
Key: SPARK-9237
URL: https://issues.apache.org/jira/browse/SPARK-9237
Project: Spark
Issue Type: Improvement
Reporter: Ted Malaska
Priority: Minor
This jira is to add a very common data quality check into dataframes.
A quick outline of this functionality can be seen in the following blog post
http://blog.cloudera.com/blog/2015/07/how-to-do-data-quality-checks-using-apache-spark-dataframes/
There are two parts to this Jira.
1. How to implement the Top N Count. Which I will start with the
implementation in the blog
2. Where to add the function. Ether straight off Dataframe, in Dataframe
describe or in DataFrameStatFunctions. I will start with putting it into
DataFrameStatFunctions.
Please let me know if you have any input.
Thanks
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]