[incubator-age-website] branch master updated: Updated Aggregation documentation

joshinnis Thu, 21 Oct 2021 16:04:54 -0700

This is an automated email from the ASF dual-hosted git repository.

joshinnis pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-age-website.git



The following commit(s) were added to refs/heads/master by this push:
     new bf32017  Updated Aggregation documentation
bf32017 is described below

commit bf32017e0dfc2804fa9da06deaf36481ed84aadb
Author: Josh Innis <[email protected]>
AuthorDate: Thu Oct 21 16:03:27 2021 -0700

    Updated Aggregation documentation
---
 docs/conf.py                          |   2 +-
 docs/functions/aggregate_functions.md |  23 ++-
 docs/intro/aggregation.md             | 308 +++++++++++++++++++++++++++++++++-
 3 files changed, 325 insertions(+), 8 deletions(-)

diff --git a/docs/conf.py b/docs/conf.py
index b5d4e69..684e72a 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -57,7 +57,7 @@ html_theme = 'sphinx_rtd_theme'
 # Add any paths that contain custom static files (such as style sheets) here,
 # relative to this directory. They are copied after the builtin static files,
 # so a file named "default.css" will overwrite the builtin "default.css".
-html_static_path = ['_static', 'css']
+html_static_path = ['_static', 'css', 'scripts']
 
 html_logo = 'images/logo.png'
 html_favicon = 'images/favicon.ico'
diff --git a/docs/functions/aggregate_functions.md 
b/docs/functions/aggregate_functions.md
index d08f0cf..2c6bdd0 100644
--- a/docs/functions/aggregate_functions.md
+++ b/docs/functions/aggregate_functions.md
@@ -1,6 +1,27 @@
 # Aggregation Functions
 
-<a href="../scripts/functions/aggregation_setup.sql">Data Setup Script</a>
+Functions that activate [auto aggregation](../intro/aggregation.html#).
+
+## Data Setup
+```
+LOAD 'age';
+SET search_path TO ag_catalog;
+
+SELECT create_graph('graph_name');
+
+SELECT * FROM cypher('graph_name', $$
+       CREATE (a:Person {name: 'A', age: 13}),
+       (b:Person {name: 'B', age: 33, eyes: "blue"}),
+       (c:Person {name: 'C', age: 44, eyes: "blue"}),
+       (d1:Person {name: 'D', eyes: "brown"}),
+       (d2:Person {name: 'D'}),
+       (a)-[:KNOWS]->(b),
+       (a)-[:KNOWS]->(c),
+       (a)-[:KNOWS]->(d1),
+       (b)-[:KNOWS]->(d2),
+       (c)-[:KNOWS]->(d2)
+$$) as (a agtype);
+```
 
 ## min()
 
diff --git a/docs/intro/aggregation.md b/docs/intro/aggregation.md
index 2c2d80d..d49c8f1 100644
--- a/docs/intro/aggregation.md
+++ b/docs/intro/aggregation.md
@@ -1,13 +1,309 @@
 # Aggregation
 
-Generally an aggregation aggr(expr) processes all matching rows for each 
aggregation key found in an incoming record (keys are compared using 
equivalence).
 
-For a fixed aggregation key and each matching record, expr is evaluated to a 
value. This yields a listof  candidate  values.  Generally  the  order  of  
candidate  values  is  unspecified.  If  the  aggregation happens  in  a  
projection  with  an  associated  ORDER  BY  subclause,  the  list  of  
candidate  values  isordered  in  the  same  way  as  the  underlying  records  
and  as  specified  by  the  associated  ORDER  BY subclause.
+## Introduction 
+
+Generally an aggregation aggr(expr) processes all matching rows for each 
aggregation key found in an incoming record (keys are compared using 
[equivalence](./comparability.html#)).
+
+In a regular aggregation (i.e. of the form aggr(expr)), the list of aggregated 
values is the list of candidate values with all null values removed from it.
+
+## Data Setup
+
+```
+SELECT * FROM cypher('graph_name', $$
+       CREATE (a:Person {name: 'A', age: 13}),
+       (b:Person {name: 'B', age: 33, eyes: "blue"}),
+       (c:Person {name: 'C', age: 44, eyes: "blue"}),
+       (d1:Person {name: 'D', eyes: "brown"}),
+       (d2:Person {name: 'D'}),
+       (a)-[:KNOWS]->(b),
+       (a)-[:KNOWS]->(c),
+       (a)-[:KNOWS]->(d1),
+       (b)-[:KNOWS]->(d2),
+       (c)-[:KNOWS]->(d2)
+$$) as (a agtype);
+```
+
+## Auto Group By
+To calculate aggregated data, Cypher offers aggregation, analogous to SQL’s 
GROUP BY.
+
+Aggregating functions take a  set of values and calculate An aggregated value 
over them. Examples are [avg()](../functions/aggregate_functions.html#avg) that 
calculates the average of multiple numeric values, or 
[min()](../functions/aggregate_functions.html#min) that finds the smallest 
numeric or string value in a set of values. When we say below that an 
aggregating function operates on a set of values, we mean these to be the 
result of the application of the inner expression(such as n.a [...]
+
+Aggregation can be computed over all the matching subgraphs, or it can be 
further divided by introducing grouping keys. These are non-aggregate 
expressions, that are used to group the valuesgoing into the aggregate 
functions.
+
+Assume we have the following return statement:
+```
+SELECT * FROM cypher('graph_name', $$
+       MATCH (v:Person)
+       RETURN v.name, count(*)
+$$) as (grouping_key agtype, count agtype);
+```
+
+<table>
+  <tr>
+   <td>count</td>
+   <td>key</td>
+  </tr>
+  <tr>
+   <td>"A"</td>
+   <td>1</td>
+  </tr>
+  <tr>
+   <td>"B"</td>
+   <td>1</td>
+  </tr>
+  <tr>
+   <td>"C"</td>
+   <td>1</td>
+  </tr>
+  <tr>
+   <td>"D"</td>
+   <td>2</td>
+  </tr>
+  <tr>
+   <td colspan="2">1 row</td>
+  </tr>
+</table>
+
+
+We have two return expressions: grouping_key, and count(*). The first, 
grouping_key, is not an aggregate function, and so it will  be  the  grouping  
key. The latter, count(*) is an aggregate expression. The matching subgraphs 
will be divided into different  buckets, depending on the grouping key. The 
aggregate function will then be run on these buckets, calculating an aggregate 
value per bucket. 
+
+## Sorting on aggregate functions
+
+To use aggregations to sort the result set, the aggregation must be included 
in the RETURN to be used in the ORDER BY.
+
+```
+SELECT *
+FROM cypher('graph_name', $$
+       MATCH (me:Person)-[]->(friend:Person)
+       RETURN count(friend), me
+       ORDER BY count(friend)
+$$) as (friends agtype, me agtype);
+```
+
+## Distinct aggregation
+In a distinct aggregation (i.e. of the form aggr(DISTINCT expr)), the list of 
aggregated values is the list of candidate values with all null values  removed 
from it. Furthermore, in a distinct aggregation, only one of all equivalent 
candidate values is included in the list of aggregated values, i.e. duplicates 
under equivalence are  removed. 
+
+
+The DISTINCT operator works in conjunction with aggregation. It is used to 
make all values unique before running them  through an aggregate function.
+
+```
+SELECT *
+FROM cypher('graph_name', $$
+       MATCH (v:Person)
+       RETURN count(DISTINCT v.eyes), count(v.eyes)
+$$) as (distinct_eyes agtype, eyes agtype);
+```
+
+<table>
+  <tr>
+   <td>distinct_eyes</td>
+   <td>eyes</td>
+  </tr>
+  <tr>
+   <td>2</td>
+   <td>3</td>
+  </tr>
+  <tr>
+   <td colspan="2">1 row</td>
+  </tr>
+</table>
+
+## Ambiguous Grouping Statements
+
+This feature of not requiring the user to specifiy their grouping keys for a 
query allows for ambiguity on what Cypher should qualify as their grouping 
keys. For more details [click 
here.](https://opencypher.org/articles/2017/07/27/ocig1-aggregations-article/)
+
+Data Setup 
+```
+SELECT * FROM cypher('graph_name', $$
+CREATE (:L {a: 1, b: 2, c: 3}),
+       (:L {a: 2, b: 3, c: 1}),
+       (:L {a: 3, b: 1, c: 2})
+$$) as (a agtype);
+```
+
+### Invalid Query in AGE
+AGE's solution to this problem is to not allow a WITH or RETURN column to 
combine aggregate functions with variables that are not explicitly listed in 
another column of the same WITH or RETURN clause.
+
+
+
+Query:
+```
+SELECT * FROM cypher('graph_name', $$
+       MATCH (x:L)
+       RETURN x.a + count(*) + x.b + count(*) + x.c
+$$) as (a agtype);
+```
+
+Result:
+```
+ERROR:  "x" must be either part of an explicitly listed key or used inside an 
aggregate function
+LINE 3: RETURN x.a + count(*) + x.b + count(*) + x.c
+```
+
+
+### Valid Query in AGE
+Columns that do not include an aggregate function in AGE are considered to be 
the grouping keys for that WITH or RETURN clause. 
+
+For the above query, the user could rewrite the query is several ways that 
will return results
+
+Query:
+```
+SELECT * FROM cypher('graph_name', $$
+       MATCH (x:L)
+       RETURN (x.a + x.b + x.c) + count(*) + count(*), x.a + x.b + x.c
+$$) as (count agtype, key agtype);
+```
+
+x.a + x.b + x.c is the grouping key. Grouping keys created like this must 
include parenthesis.
+
+Results
+<table>
+  <tr>
+   <td>count</td>
+   <td>key</td>
+  </tr>
+  <tr>
+   <td>12</td>
+   <td>6</td>
+  </tr>
+  <tr>
+   <td colspan="2">1 row</td>
+  </tr>
+</table>
+
+
+
+Query
+```
+SELECT * FROM cypher('graph_name', $$
+       MATCH (x:L)
+       RETURN x.a + count(*) + x.b + count(*) + x.c, x.a, x.b, x.c
+$$) as (count agtype, a agtype, b agtype, c agtype);
+```
+
+x.a, x.b, and x.c will be considered different grouping keys
+
+Results:
+
+<table>
+  <thead>
+  <tr>
+   <td>count</td>
+   </td>a<td>
+   </td>b<td>
+   </td>c<td>
+  </tr>
+  </thead>
+  <tr>
+   <td>8</td>
+   <td>3</td>
+   <td>1</td>
+   <td>2</td>
+  </tr>
+  <tr>
+   <td>8</td>
+   <td>2</td>
+   <td>3</td>
+   <td>1</td>
+  </tr>
+  <tr>
+   <td>8</td>
+   <td>1</td>
+   <td>2</td>
+   <td>3</td>
+  </tr>
+  <tr>
+   <td colspan="4">3 rows</td>
+  </tr>
+</table>
+
+### Vertices and edges in ambiguous grouping
+
+Alternatively, the grouping key can be a vertex or edge, and then any 
properties of the vertex or edge can be specified without being explicitly 
stated in a WITH or RETURN column.
+
+```
+SELECT * FROM cypher('graph_name', $$
+       MATCH (x:L)
+       RETURN count(*) + count(*) + x.a + x.b + x.c, x
+$$) as (count agtype, key agtype);
+```
+
+Results will be grouped on x, because it is safe to assume that properties be 
considered unecessary for grouping to be unambiguous.
+
+Results
+<table>
+  <thead>
+  <tr>
+   <td>count</td>
+   </td>key<td>
+  </tr>
+  </thead>
+  <tr>
+   <td>8</td>
+   <td>{"id": 1407374883553283, "label": "L", "properties": {"a": 3, "b": 1, 
"c": 2}}::vertex</td>
+  </tr>
+  <tr>
+   <td>8</td>
+   <td>{"id": 1407374883553281, "label": "L", "properties": {"a": 1, "b": 2, 
"c": 3}}::vertex</td>
+  </tr>
+  <tr>
+   <td>8</td>
+   <td>{"id": 1407374883553282, "label": "L", "properties": {"a": 2, "b": 3, 
"c": 1}}::vertex</td>
+  </tr>
+  <tr>
+   <td colspan="4">3 rows</td>
+  </tr>
+</table>
+
+
+### Hiding unwanted grouping keys
+
+If the grouping key is considered unecessary for the query output, the 
aggregation can be done in a WITH clause then passing information to the RETURN 
clause.
+
+SELECT * FROM cypher('graph_name', $$
+       MATCH (x:L)
+       WITH count(*) + count(*) + x.a + x.b + x.c as column, x
+       RETURN column
+$$) as (a agtype);
+
+Results
+<table>
+  <thead>
+  <tr>
+   <td>a</td>
+  </tr>
+  </thead>
+  <tr>
+   <td>8</td>
+  </tr>
+  <tr>
+   <td>8</td>
+  </tr>
+  <tr>
+   <td>8</td>
+  </tr>
+  <tr>
+   <td colspan="1">3 rows</td>
+  </tr>
+</table>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
 
-In  a  regular  aggregation  (i.e.  of  the  form  aggr(expr)),  the  list  of 
 aggregated  values  is  the  list  of candidate values with all null values 
removed from it.In a distinct aggregation (i.e. of the form aggr(DISTINCT 
expr)), the list of aggregated values is the listof  candidate  values  with  
all  null  values  removed  from  it.  Furthermore,  in  a  distinct  
aggregation,only  one  of  all  equivalent  candidate  values  is  included  in 
 the  list  of  aggregated  values [...]
 
-Finally, the remaining aggregated values are processed by the actual 
aggregation function. If the list of aggregated  values  is  empty,  the  
aggregation  function  returns  a  default  value  (null  unlessspecified  
otherwise  below).  Aggregating  values  of  different  types  (like  summing  
a  number  and  astring) may lead to runtime errors.
 
-See 
 
-aggregation functions for more details.

[incubator-age-website] branch master updated: Updated Aggregation documentation

Reply via email to