This is an automated email from the ASF dual-hosted git repository.

houqp pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


The following commit(s) were added to refs/heads/master by this push:
     new a09e1ae  add docs for approx functions (#2082)
a09e1ae is described below

commit a09e1aeb5fa279e2a14554c3dad9dfb17d9326e7
Author: Rich <[email protected]>
AuthorDate: Sun Mar 27 16:36:29 2022 -0400

    add docs for approx functions (#2082)
    
    Co-authored-by: Andrew Lamb <[email protected]>
---
 docs/source/user-guide/sql/aggregate_functions.md | 62 +++++++++++++++++++++++
 docs/source/user-guide/sql/index.rst              |  1 +
 docs/source/user-guide/sql/sql_status.md          |  3 ++
 3 files changed, 66 insertions(+)

diff --git a/docs/source/user-guide/sql/aggregate_functions.md 
b/docs/source/user-guide/sql/aggregate_functions.md
new file mode 100644
index 0000000..d3472a7
--- /dev/null
+++ b/docs/source/user-guide/sql/aggregate_functions.md
@@ -0,0 +1,62 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Aggregate Functions
+
+Aggregate functions operate on a set of values to compute a single result. 
Please refer to 
[PostgreSQL](https://www.postgresql.org/docs/current/functions-aggregate.html) 
for usage of standard SQL functions.
+
+## General
+
+- min
+- max
+- count
+- avg
+- sum
+- array_agg
+
+## Statistical
+
+- var / var_samp / var_pop
+- stddev / stddev_samp / stddev_pop
+- covar / covar_samp / covar_pop
+- corr
+
+## Approximate
+
+### approx_distinct
+
+`approx_distinct(x) -> uint64` returns the approximate number (HyperLogLog) of 
distinct input values
+
+### approx_median
+
+`approx_median(x) -> x` returns the approximate median of input values. it is 
an alias of `approx_percentile_cont(x, 0.5)`.
+
+### approx_percentile_cont
+
+`approx_percentile_cont(x, p) -> x` return the approximate percentile 
(TDigest) of input values, where `p` is a float64 between 0 and 1 (inclusive).
+
+It supports raw data as input and build Tdigest sketches during query time, 
and is approximately equal to `approx_percentile_cont_with_weight(x, 1, p)`.
+
+### approx_percentile_cont_with_weight
+
+`approx_percentile_cont_with_weight(x, w, p) -> x` returns the approximate 
percentile (TDigest) of input values with weight, where `w` is weight column 
expression and `p` is a float64 between 0 and 1 (inclusive).
+
+It supports raw data as input or pre-aggregated TDigest sketches, then builds 
or merges Tdigest sketches during query time. TDigest sketches are a list of 
centroid `(x, w)`, where `x` stands for mean and `w` stands for weight.
+
+It is suitable for low latency OLAP system where a streaming compute engine 
(e.g. Spark Streaming/Flink) pre-aggregates data to a data store, then queries 
using Datafusion.
diff --git a/docs/source/user-guide/sql/index.rst 
b/docs/source/user-guide/sql/index.rst
index fc96acc..f6d3a0b 100644
--- a/docs/source/user-guide/sql/index.rst
+++ b/docs/source/user-guide/sql/index.rst
@@ -24,4 +24,5 @@ SQL Reference
    sql_status
    select
    ddl
+   aggregate_functions
    DataFusion Functions <datafusion-functions>
diff --git a/docs/source/user-guide/sql/sql_status.md 
b/docs/source/user-guide/sql/sql_status.md
index a8ecc5e..4b33690 100644
--- a/docs/source/user-guide/sql/sql_status.md
+++ b/docs/source/user-guide/sql/sql_status.md
@@ -76,6 +76,9 @@
   - [x] nullif
 - Approximation functions
   - [x] approx_distinct
+  - [x] approx_median
+  - [x] approx_percentile_cont
+  - [x] approx_percentile_cont_with_weight
 - Common date/time functions
   - [ ] Basic date functions
   - [ ] Basic time functions

Reply via email to