gagan taneja created SPARK-18940:
------------------------------------
Summary: Percentile and approximate percentile support for
frequency distribution table
Key: SPARK-18940
URL: https://issues.apache.org/jira/browse/SPARK-18940
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 2.0.2
Reporter: gagan taneja
I have a frequency distribution table with following entries
Age, No of person
21, 10
22, 15
23, 18
..
..
30, 14
Moreover it is common to have data in frequency distribution format to further
calculate Percentile, Median. With current implementation
It would be very difficult and complex to find the percentile.
Therefore i am proposing enhancement to current Percentile and Approx
Percentile implementation to take frequency distribution column into
consideration
Current Percentile definition
percentile(col, array(percentage1 [, percentage2]...))
case class Percentile(
child: Expression,
percentageExpression: Expression,
mutableAggBufferOffset: Int = 0,
inputAggBufferOffset: Int = 0) {
def this(child: Expression, percentageExpression: Expression) = {
this(child, percentageExpression, 0, 0)
}
}
Proposed changes
percentile(col, [frequency], array(percentage1 [, percentage2]...))
case class Percentile(
child: Expression,
frequency : Expression,
percentageExpression: Expression,
mutableAggBufferOffset: Int = 0,
inputAggBufferOffset: Int = 0) {
def this(child: Expression, percentageExpression: Expression) = {
this(child, Literal(1L), percentageExpression, 0, 0)
}
def this(child: Expression, frequency : Expression, percentageExpression:
Expression) = {
this(child, frequency, percentageExpression, 0, 0)
}
}
Although this definition will differ from hive implementation, it will be
useful functionality to many spark user.
Moreover the changes are local to only Percentile and ApproxPercentile
implementation
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]