This is an automated email from the ASF dual-hosted git repository.
joshinnis pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-age-website.git
The following commit(s) were added to refs/heads/master by this push:
new bf32017 Updated Aggregation documentation
bf32017 is described below
commit bf32017e0dfc2804fa9da06deaf36481ed84aadb
Author: Josh Innis <[email protected]>
AuthorDate: Thu Oct 21 16:03:27 2021 -0700
Updated Aggregation documentation
---
docs/conf.py | 2 +-
docs/functions/aggregate_functions.md | 23 ++-
docs/intro/aggregation.md | 308 +++++++++++++++++++++++++++++++++-
3 files changed, 325 insertions(+), 8 deletions(-)
diff --git a/docs/conf.py b/docs/conf.py
index b5d4e69..684e72a 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -57,7 +57,7 @@ html_theme = 'sphinx_rtd_theme'
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
-html_static_path = ['_static', 'css']
+html_static_path = ['_static', 'css', 'scripts']
html_logo = 'images/logo.png'
html_favicon = 'images/favicon.ico'
diff --git a/docs/functions/aggregate_functions.md
b/docs/functions/aggregate_functions.md
index d08f0cf..2c6bdd0 100644
--- a/docs/functions/aggregate_functions.md
+++ b/docs/functions/aggregate_functions.md
@@ -1,6 +1,27 @@
# Aggregation Functions
-<a href="../scripts/functions/aggregation_setup.sql">Data Setup Script</a>
+Functions that activate [auto aggregation](../intro/aggregation.html#).
+
+## Data Setup
+```
+LOAD 'age';
+SET search_path TO ag_catalog;
+
+SELECT create_graph('graph_name');
+
+SELECT * FROM cypher('graph_name', $$
+ CREATE (a:Person {name: 'A', age: 13}),
+ (b:Person {name: 'B', age: 33, eyes: "blue"}),
+ (c:Person {name: 'C', age: 44, eyes: "blue"}),
+ (d1:Person {name: 'D', eyes: "brown"}),
+ (d2:Person {name: 'D'}),
+ (a)-[:KNOWS]->(b),
+ (a)-[:KNOWS]->(c),
+ (a)-[:KNOWS]->(d1),
+ (b)-[:KNOWS]->(d2),
+ (c)-[:KNOWS]->(d2)
+$$) as (a agtype);
+```
## min()
diff --git a/docs/intro/aggregation.md b/docs/intro/aggregation.md
index 2c2d80d..d49c8f1 100644
--- a/docs/intro/aggregation.md
+++ b/docs/intro/aggregation.md
@@ -1,13 +1,309 @@
# Aggregation
-Generally an aggregation aggr(expr) processes all matching rows for each
aggregation key found in an incoming record (keys are compared using
equivalence).
-For a fixed aggregation key and each matching record, expr is evaluated to a
value. This yields a listof candidate values. Generally the order of
candidate values is unspecified. If the aggregation happens in a
projection with an associated ORDER BY subclause, the list of
candidate values isordered in the same way as the underlying records
and as specified by the associated ORDER BY subclause.
+## Introduction
+
+Generally an aggregation aggr(expr) processes all matching rows for each
aggregation key found in an incoming record (keys are compared using
[equivalence](./comparability.html#)).
+
+In a regular aggregation (i.e. of the form aggr(expr)), the list of aggregated
values is the list of candidate values with all null values removed from it.
+
+## Data Setup
+
+```
+SELECT * FROM cypher('graph_name', $$
+ CREATE (a:Person {name: 'A', age: 13}),
+ (b:Person {name: 'B', age: 33, eyes: "blue"}),
+ (c:Person {name: 'C', age: 44, eyes: "blue"}),
+ (d1:Person {name: 'D', eyes: "brown"}),
+ (d2:Person {name: 'D'}),
+ (a)-[:KNOWS]->(b),
+ (a)-[:KNOWS]->(c),
+ (a)-[:KNOWS]->(d1),
+ (b)-[:KNOWS]->(d2),
+ (c)-[:KNOWS]->(d2)
+$$) as (a agtype);
+```
+
+## Auto Group By
+To calculate aggregated data, Cypher offers aggregation, analogous to SQL’s
GROUP BY.
+
+Aggregating functions take a set of values and calculate An aggregated value
over them. Examples are [avg()](../functions/aggregate_functions.html#avg) that
calculates the average of multiple numeric values, or
[min()](../functions/aggregate_functions.html#min) that finds the smallest
numeric or string value in a set of values. When we say below that an
aggregating function operates on a set of values, we mean these to be the
result of the application of the inner expression(such as n.a [...]
+
+Aggregation can be computed over all the matching subgraphs, or it can be
further divided by introducing grouping keys. These are non-aggregate
expressions, that are used to group the valuesgoing into the aggregate
functions.
+
+Assume we have the following return statement:
+```
+SELECT * FROM cypher('graph_name', $$
+ MATCH (v:Person)
+ RETURN v.name, count(*)
+$$) as (grouping_key agtype, count agtype);
+```
+
+<table>
+ <tr>
+ <td>count</td>
+ <td>key</td>
+ </tr>
+ <tr>
+ <td>"A"</td>
+ <td>1</td>
+ </tr>
+ <tr>
+ <td>"B"</td>
+ <td>1</td>
+ </tr>
+ <tr>
+ <td>"C"</td>
+ <td>1</td>
+ </tr>
+ <tr>
+ <td>"D"</td>
+ <td>2</td>
+ </tr>
+ <tr>
+ <td colspan="2">1 row</td>
+ </tr>
+</table>
+
+
+We have two return expressions: grouping_key, and count(*). The first,
grouping_key, is not an aggregate function, and so it will be the grouping
key. The latter, count(*) is an aggregate expression. The matching subgraphs
will be divided into different buckets, depending on the grouping key. The
aggregate function will then be run on these buckets, calculating an aggregate
value per bucket.
+
+## Sorting on aggregate functions
+
+To use aggregations to sort the result set, the aggregation must be included
in the RETURN to be used in the ORDER BY.
+
+```
+SELECT *
+FROM cypher('graph_name', $$
+ MATCH (me:Person)-[]->(friend:Person)
+ RETURN count(friend), me
+ ORDER BY count(friend)
+$$) as (friends agtype, me agtype);
+```
+
+## Distinct aggregation
+In a distinct aggregation (i.e. of the form aggr(DISTINCT expr)), the list of
aggregated values is the list of candidate values with all null values removed
from it. Furthermore, in a distinct aggregation, only one of all equivalent
candidate values is included in the list of aggregated values, i.e. duplicates
under equivalence are removed.
+
+
+The DISTINCT operator works in conjunction with aggregation. It is used to
make all values unique before running them through an aggregate function.
+
+```
+SELECT *
+FROM cypher('graph_name', $$
+ MATCH (v:Person)
+ RETURN count(DISTINCT v.eyes), count(v.eyes)
+$$) as (distinct_eyes agtype, eyes agtype);
+```
+
+<table>
+ <tr>
+ <td>distinct_eyes</td>
+ <td>eyes</td>
+ </tr>
+ <tr>
+ <td>2</td>
+ <td>3</td>
+ </tr>
+ <tr>
+ <td colspan="2">1 row</td>
+ </tr>
+</table>
+
+## Ambiguous Grouping Statements
+
+This feature of not requiring the user to specifiy their grouping keys for a
query allows for ambiguity on what Cypher should qualify as their grouping
keys. For more details [click
here.](https://opencypher.org/articles/2017/07/27/ocig1-aggregations-article/)
+
+Data Setup
+```
+SELECT * FROM cypher('graph_name', $$
+CREATE (:L {a: 1, b: 2, c: 3}),
+ (:L {a: 2, b: 3, c: 1}),
+ (:L {a: 3, b: 1, c: 2})
+$$) as (a agtype);
+```
+
+### Invalid Query in AGE
+AGE's solution to this problem is to not allow a WITH or RETURN column to
combine aggregate functions with variables that are not explicitly listed in
another column of the same WITH or RETURN clause.
+
+
+
+Query:
+```
+SELECT * FROM cypher('graph_name', $$
+ MATCH (x:L)
+ RETURN x.a + count(*) + x.b + count(*) + x.c
+$$) as (a agtype);
+```
+
+Result:
+```
+ERROR: "x" must be either part of an explicitly listed key or used inside an
aggregate function
+LINE 3: RETURN x.a + count(*) + x.b + count(*) + x.c
+```
+
+
+### Valid Query in AGE
+Columns that do not include an aggregate function in AGE are considered to be
the grouping keys for that WITH or RETURN clause.
+
+For the above query, the user could rewrite the query is several ways that
will return results
+
+Query:
+```
+SELECT * FROM cypher('graph_name', $$
+ MATCH (x:L)
+ RETURN (x.a + x.b + x.c) + count(*) + count(*), x.a + x.b + x.c
+$$) as (count agtype, key agtype);
+```
+
+x.a + x.b + x.c is the grouping key. Grouping keys created like this must
include parenthesis.
+
+Results
+<table>
+ <tr>
+ <td>count</td>
+ <td>key</td>
+ </tr>
+ <tr>
+ <td>12</td>
+ <td>6</td>
+ </tr>
+ <tr>
+ <td colspan="2">1 row</td>
+ </tr>
+</table>
+
+
+
+Query
+```
+SELECT * FROM cypher('graph_name', $$
+ MATCH (x:L)
+ RETURN x.a + count(*) + x.b + count(*) + x.c, x.a, x.b, x.c
+$$) as (count agtype, a agtype, b agtype, c agtype);
+```
+
+x.a, x.b, and x.c will be considered different grouping keys
+
+Results:
+
+<table>
+ <thead>
+ <tr>
+ <td>count</td>
+ </td>a<td>
+ </td>b<td>
+ </td>c<td>
+ </tr>
+ </thead>
+ <tr>
+ <td>8</td>
+ <td>3</td>
+ <td>1</td>
+ <td>2</td>
+ </tr>
+ <tr>
+ <td>8</td>
+ <td>2</td>
+ <td>3</td>
+ <td>1</td>
+ </tr>
+ <tr>
+ <td>8</td>
+ <td>1</td>
+ <td>2</td>
+ <td>3</td>
+ </tr>
+ <tr>
+ <td colspan="4">3 rows</td>
+ </tr>
+</table>
+
+### Vertices and edges in ambiguous grouping
+
+Alternatively, the grouping key can be a vertex or edge, and then any
properties of the vertex or edge can be specified without being explicitly
stated in a WITH or RETURN column.
+
+```
+SELECT * FROM cypher('graph_name', $$
+ MATCH (x:L)
+ RETURN count(*) + count(*) + x.a + x.b + x.c, x
+$$) as (count agtype, key agtype);
+```
+
+Results will be grouped on x, because it is safe to assume that properties be
considered unecessary for grouping to be unambiguous.
+
+Results
+<table>
+ <thead>
+ <tr>
+ <td>count</td>
+ </td>key<td>
+ </tr>
+ </thead>
+ <tr>
+ <td>8</td>
+ <td>{"id": 1407374883553283, "label": "L", "properties": {"a": 3, "b": 1,
"c": 2}}::vertex</td>
+ </tr>
+ <tr>
+ <td>8</td>
+ <td>{"id": 1407374883553281, "label": "L", "properties": {"a": 1, "b": 2,
"c": 3}}::vertex</td>
+ </tr>
+ <tr>
+ <td>8</td>
+ <td>{"id": 1407374883553282, "label": "L", "properties": {"a": 2, "b": 3,
"c": 1}}::vertex</td>
+ </tr>
+ <tr>
+ <td colspan="4">3 rows</td>
+ </tr>
+</table>
+
+
+### Hiding unwanted grouping keys
+
+If the grouping key is considered unecessary for the query output, the
aggregation can be done in a WITH clause then passing information to the RETURN
clause.
+
+SELECT * FROM cypher('graph_name', $$
+ MATCH (x:L)
+ WITH count(*) + count(*) + x.a + x.b + x.c as column, x
+ RETURN column
+$$) as (a agtype);
+
+Results
+<table>
+ <thead>
+ <tr>
+ <td>a</td>
+ </tr>
+ </thead>
+ <tr>
+ <td>8</td>
+ </tr>
+ <tr>
+ <td>8</td>
+ </tr>
+ <tr>
+ <td>8</td>
+ </tr>
+ <tr>
+ <td colspan="1">3 rows</td>
+ </tr>
+</table>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-In a regular aggregation (i.e. of the form aggr(expr)), the list of
aggregated values is the list of candidate values with all null values
removed from it.In a distinct aggregation (i.e. of the form aggr(DISTINCT
expr)), the list of aggregated values is the listof candidate values with
all null values removed from it. Furthermore, in a distinct
aggregation,only one of all equivalent candidate values is included in
the list of aggregated values [...]
-Finally, the remaining aggregated values are processed by the actual
aggregation function. If the list of aggregated values is empty, the
aggregation function returns a default value (null unlessspecified
otherwise below). Aggregating values of different types (like summing
a number and astring) may lead to runtime errors.
-See
-aggregation functions for more details.