This is an automated email from the ASF dual-hosted git repository.
houqp pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git
The following commit(s) were added to refs/heads/master by this push:
new a09e1ae add docs for approx functions (#2082)
a09e1ae is described below
commit a09e1aeb5fa279e2a14554c3dad9dfb17d9326e7
Author: Rich <[email protected]>
AuthorDate: Sun Mar 27 16:36:29 2022 -0400
add docs for approx functions (#2082)
Co-authored-by: Andrew Lamb <[email protected]>
---
docs/source/user-guide/sql/aggregate_functions.md | 62 +++++++++++++++++++++++
docs/source/user-guide/sql/index.rst | 1 +
docs/source/user-guide/sql/sql_status.md | 3 ++
3 files changed, 66 insertions(+)
diff --git a/docs/source/user-guide/sql/aggregate_functions.md
b/docs/source/user-guide/sql/aggregate_functions.md
new file mode 100644
index 0000000..d3472a7
--- /dev/null
+++ b/docs/source/user-guide/sql/aggregate_functions.md
@@ -0,0 +1,62 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+# Aggregate Functions
+
+Aggregate functions operate on a set of values to compute a single result.
Please refer to
[PostgreSQL](https://www.postgresql.org/docs/current/functions-aggregate.html)
for usage of standard SQL functions.
+
+## General
+
+- min
+- max
+- count
+- avg
+- sum
+- array_agg
+
+## Statistical
+
+- var / var_samp / var_pop
+- stddev / stddev_samp / stddev_pop
+- covar / covar_samp / covar_pop
+- corr
+
+## Approximate
+
+### approx_distinct
+
+`approx_distinct(x) -> uint64` returns the approximate number (HyperLogLog) of
distinct input values
+
+### approx_median
+
+`approx_median(x) -> x` returns the approximate median of input values. it is
an alias of `approx_percentile_cont(x, 0.5)`.
+
+### approx_percentile_cont
+
+`approx_percentile_cont(x, p) -> x` return the approximate percentile
(TDigest) of input values, where `p` is a float64 between 0 and 1 (inclusive).
+
+It supports raw data as input and build Tdigest sketches during query time,
and is approximately equal to `approx_percentile_cont_with_weight(x, 1, p)`.
+
+### approx_percentile_cont_with_weight
+
+`approx_percentile_cont_with_weight(x, w, p) -> x` returns the approximate
percentile (TDigest) of input values with weight, where `w` is weight column
expression and `p` is a float64 between 0 and 1 (inclusive).
+
+It supports raw data as input or pre-aggregated TDigest sketches, then builds
or merges Tdigest sketches during query time. TDigest sketches are a list of
centroid `(x, w)`, where `x` stands for mean and `w` stands for weight.
+
+It is suitable for low latency OLAP system where a streaming compute engine
(e.g. Spark Streaming/Flink) pre-aggregates data to a data store, then queries
using Datafusion.
diff --git a/docs/source/user-guide/sql/index.rst
b/docs/source/user-guide/sql/index.rst
index fc96acc..f6d3a0b 100644
--- a/docs/source/user-guide/sql/index.rst
+++ b/docs/source/user-guide/sql/index.rst
@@ -24,4 +24,5 @@ SQL Reference
sql_status
select
ddl
+ aggregate_functions
DataFusion Functions <datafusion-functions>
diff --git a/docs/source/user-guide/sql/sql_status.md
b/docs/source/user-guide/sql/sql_status.md
index a8ecc5e..4b33690 100644
--- a/docs/source/user-guide/sql/sql_status.md
+++ b/docs/source/user-guide/sql/sql_status.md
@@ -76,6 +76,9 @@
- [x] nullif
- Approximation functions
- [x] approx_distinct
+ - [x] approx_median
+ - [x] approx_percentile_cont
+ - [x] approx_percentile_cont_with_weight
- Common date/time functions
- [ ] Basic date functions
- [ ] Basic time functions