Zhenhua Wang created SPARK-18000:

             Summary: Aggregation function for computing endpoints for numeric 
                 Key: SPARK-18000
                 URL: https://issues.apache.org/jira/browse/SPARK-18000
             Project: Spark
          Issue Type: New Feature
          Components: SQL
    Affects Versions: 2.1.0
            Reporter: Zhenhua Wang

For a column of numeric type (including date and timestamp), we will generate a 
equi-width or equi-height histogram, depending on if its ndv is large than the 
maximum number of bins allowed in one histogram (denoted as numBins).
This agg function computes values and their frequencies using a small hashmap, 
whose size is less than or equal to "numBins", and returns an equi-width 
When the size of hashmap exceeds "numBins", it cleans the hashmap and utilizes 
ApproximatePercentile to return endpoints of equi-height histogram.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to