EmmyMiao87 commented on code in PR #8927:
URL: https://github.com/apache/incubator-doris/pull/8927#discussion_r846032038
##########
new-docs/en/data-table/hit-the-rollup.md:
##########
@@ -26,20 +26,164 @@ under the License.
# Rollup and query
+ROLLUP in multidimensional analysis means "scroll up", which means that data
is aggregated further at a specified granularity.
+
+### Basic concepts
+
+In Doris, we make the table created by the user through the table building
statement a Base table. Base table holds the basic data stored in the way
specified by the user's table-building statement.
+
+On top of the Base table, we can create any number of ROLLUP tables. These
ROLLUP data are generated based on the Base table and physically **stored
independently**.
+
+The basic function of ROLLUP tables is to obtain coarser aggregated data on
the basis of Base tables.
+
+Let's illustrate the ROLLUP tables and their roles in different data models
with examples.
+
+#### ROLLUP in Aggregate Model and Uniq Model
+
+Because Uniq is only a special case of the Aggregate model, we do not
distinguish it here.
+
+Example 1: Get the total consumption per user
+
+Following [Data Model Aggregate Model](data-model.html#Aggregate Model) in the
**Aggregate Model** section, the Base table structure is as follows:
+
+| ColumnName | Type | AggregationType | Comment
|
+| --------------- | ------------ | --------------- |
-------------------------------------- |
+| user_id | LARGEINT | | user id
|
Review Comment:
```suggestion
| user\_id | LARGEINT | | user id
|
```
##########
new-docs/en/data-table/hit-the-rollup.md:
##########
@@ -26,20 +26,164 @@ under the License.
# Rollup and query
+ROLLUP in multidimensional analysis means "scroll up", which means that data
is aggregated further at a specified granularity.
+
+### Basic concepts
+
+In Doris, we make the table created by the user through the table building
statement a Base table. Base table holds the basic data stored in the way
specified by the user's table-building statement.
+
+On top of the Base table, we can create any number of ROLLUP tables. These
ROLLUP data are generated based on the Base table and physically **stored
independently**.
+
+The basic function of ROLLUP tables is to obtain coarser aggregated data on
the basis of Base tables.
+
+Let's illustrate the ROLLUP tables and their roles in different data models
with examples.
+
+#### ROLLUP in Aggregate Model and Uniq Model
+
+Because Uniq is only a special case of the Aggregate model, we do not
distinguish it here.
+
+Example 1: Get the total consumption per user
+
+Following [Data Model Aggregate Model](data-model.html#Aggregate Model) in the
**Aggregate Model** section, the Base table structure is as follows:
+
+| ColumnName | Type | AggregationType | Comment
|
+| --------------- | ------------ | --------------- |
-------------------------------------- |
+| user_id | LARGEINT | | user id
|
+| date | DATE | | date of data filling
|
+| Time stamp | DATETIME | | Data filling time,
accurate to seconds |
+| City | VARCHAR (20) | | User City
|
+| age | SMALLINT | | User age
|
+| sex | TINYINT | | User gender
|
+| Last_visit_date | DATETIME | REPLACE | Last user access time
|
+| Cost | BIGINT | SUM | Total User Consumption
|
+| max dwell time | INT | MAX | Maximum user residence
time |
+| min dwell time | INT | MIN | User minimum residence
time |
+
+The data stored are as follows:
+
+| user_id | date | timestamp | city | age | sex |
last\_visit\_date | cost | max\_dwell\_time | min\_dwell\_time |
Review Comment:
```suggestion
| user\_id | date | timestamp | city | age | sex |
last\_visit\_date | cost | max\_dwell\_time | min\_dwell\_time |
```
##########
new-docs/en/data-table/hit-the-rollup.md:
##########
@@ -26,20 +26,164 @@ under the License.
# Rollup and query
+ROLLUP in multidimensional analysis means "scroll up", which means that data
is aggregated further at a specified granularity.
+
+### Basic concepts
+
+In Doris, we make the table created by the user through the table building
statement a Base table. Base table holds the basic data stored in the way
specified by the user's table-building statement.
+
+On top of the Base table, we can create any number of ROLLUP tables. These
ROLLUP data are generated based on the Base table and physically **stored
independently**.
+
+The basic function of ROLLUP tables is to obtain coarser aggregated data on
the basis of Base tables.
+
+Let's illustrate the ROLLUP tables and their roles in different data models
with examples.
+
+#### ROLLUP in Aggregate Model and Uniq Model
+
+Because Uniq is only a special case of the Aggregate model, we do not
distinguish it here.
+
+Example 1: Get the total consumption per user
+
+Following [Data Model Aggregate Model](data-model.html#Aggregate Model) in the
**Aggregate Model** section, the Base table structure is as follows:
+
+| ColumnName | Type | AggregationType | Comment
|
+| --------------- | ------------ | --------------- |
-------------------------------------- |
+| user_id | LARGEINT | | user id
|
+| date | DATE | | date of data filling
|
+| Time stamp | DATETIME | | Data filling time,
accurate to seconds |
+| City | VARCHAR (20) | | User City
|
+| age | SMALLINT | | User age
|
+| sex | TINYINT | | User gender
|
+| Last_visit_date | DATETIME | REPLACE | Last user access time
|
+| Cost | BIGINT | SUM | Total User Consumption
|
+| max dwell time | INT | MAX | Maximum user residence
time |
+| min dwell time | INT | MIN | User minimum residence
time |
+
+The data stored are as follows:
+
+| user_id | date | timestamp | city | age | sex |
last\_visit\_date | cost | max\_dwell\_time | min\_dwell\_time |
+| ------- | ---------- | ------------------- | --------- | ---- | ---- |
------------------- | ---- | ---------------- | ---------------- |
+| 10000 | 2017-10-01 | 2017-10-01 08:00:05 | Beijing | 20 | 0 |
2017-10-01 06:00 | 20 | 10 | 10 |
+| 10000 | 2017-10-01 | 2017-10-01 09:00:05 | Beijing | 20 | 0 |
2017-10-01 07:00 | 15 | 2 | 2 |
+| 10001 | 2017-10-01 | 2017-10-01 18:12:10 | Beijing | 30 | 1 |
2017-10-01 17:05:45 | 2 | 22 | 22 |
+| 10002 | 2017-10-02 | 2017-10-02 13:10:00 | Shanghai | 20 | 1 |
2017-10-02 12:59:12 | 200 | 5 | 5 |
+| 10003 | 2017-10-02 | 2017-10-02 13:15:00 | Guangzhou | 32 | 0 |
2017-10-02 11:20:00 | 30 | 11 | 11 |
+| 10004 | 2017-10-01 | 2017-10-01 12:12:48 | Shenzhen | 35 | 0 |
2017-10-01 10:00:15 | 100 | 3 | 3 |
+| 10004 | 2017-10-03 | 2017-10-03 12:38:20 | Shenzhen | 35 | 0 |
2017-10-03 10:20:22 | 11 | 6 | 6 |
+
+On this basis, we create a ROLLUP:
+
+| ColumnName |
+| ---------- |
+| user_id |
+| cost |
+
+The ROLLUP contains only two columns: user_id and cost. After the creation,
the data stored in the ROLLUP is as follows:
Review Comment:
```suggestion
The ROLLUP contains only two columns: user\_id and cost. After the creation,
the data stored in the ROLLUP is as follows:
```
##########
new-docs/en/data-table/hit-the-rollup.md:
##########
@@ -26,20 +26,164 @@ under the License.
# Rollup and query
+ROLLUP in multidimensional analysis means "scroll up", which means that data
is aggregated further at a specified granularity.
+
+### Basic concepts
+
+In Doris, we make the table created by the user through the table building
statement a Base table. Base table holds the basic data stored in the way
specified by the user's table-building statement.
+
+On top of the Base table, we can create any number of ROLLUP tables. These
ROLLUP data are generated based on the Base table and physically **stored
independently**.
+
+The basic function of ROLLUP tables is to obtain coarser aggregated data on
the basis of Base tables.
+
+Let's illustrate the ROLLUP tables and their roles in different data models
with examples.
+
+#### ROLLUP in Aggregate Model and Uniq Model
+
+Because Uniq is only a special case of the Aggregate model, we do not
distinguish it here.
+
+Example 1: Get the total consumption per user
+
+Following [Data Model Aggregate Model](data-model.html#Aggregate Model) in the
**Aggregate Model** section, the Base table structure is as follows:
+
+| ColumnName | Type | AggregationType | Comment
|
+| --------------- | ------------ | --------------- |
-------------------------------------- |
+| user_id | LARGEINT | | user id
|
+| date | DATE | | date of data filling
|
+| Time stamp | DATETIME | | Data filling time,
accurate to seconds |
+| City | VARCHAR (20) | | User City
|
+| age | SMALLINT | | User age
|
+| sex | TINYINT | | User gender
|
+| Last_visit_date | DATETIME | REPLACE | Last user access time
|
+| Cost | BIGINT | SUM | Total User Consumption
|
+| max dwell time | INT | MAX | Maximum user residence
time |
+| min dwell time | INT | MIN | User minimum residence
time |
+
+The data stored are as follows:
+
+| user_id | date | timestamp | city | age | sex |
last\_visit\_date | cost | max\_dwell\_time | min\_dwell\_time |
+| ------- | ---------- | ------------------- | --------- | ---- | ---- |
------------------- | ---- | ---------------- | ---------------- |
+| 10000 | 2017-10-01 | 2017-10-01 08:00:05 | Beijing | 20 | 0 |
2017-10-01 06:00 | 20 | 10 | 10 |
+| 10000 | 2017-10-01 | 2017-10-01 09:00:05 | Beijing | 20 | 0 |
2017-10-01 07:00 | 15 | 2 | 2 |
+| 10001 | 2017-10-01 | 2017-10-01 18:12:10 | Beijing | 30 | 1 |
2017-10-01 17:05:45 | 2 | 22 | 22 |
+| 10002 | 2017-10-02 | 2017-10-02 13:10:00 | Shanghai | 20 | 1 |
2017-10-02 12:59:12 | 200 | 5 | 5 |
+| 10003 | 2017-10-02 | 2017-10-02 13:15:00 | Guangzhou | 32 | 0 |
2017-10-02 11:20:00 | 30 | 11 | 11 |
+| 10004 | 2017-10-01 | 2017-10-01 12:12:48 | Shenzhen | 35 | 0 |
2017-10-01 10:00:15 | 100 | 3 | 3 |
+| 10004 | 2017-10-03 | 2017-10-03 12:38:20 | Shenzhen | 35 | 0 |
2017-10-03 10:20:22 | 11 | 6 | 6 |
+
+On this basis, we create a ROLLUP:
+
+| ColumnName |
+| ---------- |
+| user_id |
+| cost |
+
+The ROLLUP contains only two columns: user_id and cost. After the creation,
the data stored in the ROLLUP is as follows:
+
+| user\_id | cost |
+| -------- | ---- |
+| 10000 | 35 |
+| 10001 | 2 |
+| 10002 | 200 |
+| 10003 | 30 |
+| 10004 | 111 |
+
+As you can see, ROLLUP retains only the results of SUM on the cost column for
each user_id. So when we do the following query:
+
+`SELECT user_id, sum(cost) FROM table GROUP BY user_id;`
+
+Doris automatically hits the ROLLUP table, thus completing the aggregated
query by scanning only a very small amount of data.
+
+2. Example 2: Get the total consumption, the longest and shortest page
residence time of users of different ages in different cities
+
+Follow example 1. Based on the Base table, we create a ROLLUP:
+
+| ColumnName | Type | AggregationType | Comment
|
+| -------------- | ------------ | --------------- |
--------------------------- |
+| City | VARCHAR (20) | | User City
|
+| age | SMALLINT | | User age
|
+| Cost | BIGINT | SUM | Total User Consumption
|
+| max dwell time | INT | MAX | Maximum user residence
time |
+| min dwell time | INT | MIN | User minimum residence
time |
+
+After the creation, the data stored in the ROLLUP is as follows:
+
+| city | age | cost | max\_dwell\_time | min\_dwell\_time |
+| --------- | ---- | ---- | ---------------- | ---------------- |
+| Beijing | 20 | 35 | 10 | 2 |
+| Beijing | 30 | 2 | 22 | 22 |
+| Shanghai | 20 | 200 | 5 | 5 |
+| Guangzhou | 32 | 30 | 11 | 11 |
+| Shenzhen | 35 | 111 | 6 | 3 |
+
+When we do the following queries:
+
+```sql
+mysql> SELECT city, age, sum(cost), max(max_dwell_time), min(min_dwell_time)
FROM table GROUP BY city, age;
+mysql> SELECT city, sum(cost), max(max_dwell_time), min(min_dwell_time) FROM
table GROUP BY city;
+mysql> SELECT city, age, sum(cost), min(min_dwell_time) FROM table GROUP BY
city, age;
+```
+
+Doris automatically hits the ROLLUP table.
+
+#### ROLLUP in Duplicate Model
+
+Because the Duplicate model has no aggregate semantics. So the ROLLLUP in this
model has lost the meaning of "scroll up". It's just to adjust the column order
to hit the prefix index. In the next section, we will introduce prefix index in
[data model prefix index](data-model.html#prefix index), and how to use ROLLUP
to change prefix index in order to achieve better query efficiency.
+
+## ROLLUP adjusts prefix index
+
+Because column order is specified when a table is built, there is only one
prefix index for a table. This may be inefficient for queries that use other
columns that cannot hit prefix indexes as conditions. Therefore, we can
manually adjust the order of columns by creating ROLLUP. Examples are given.
+
+The structure of the Base table is as follows:
+
+| ColumnName | Type |
+| ---------------- | ------------ |
+| user\_id | BIGINT |
+| age | INT |
+| message | VARCHAR(100) |
+| max\_dwell\_time | DATETIME |
+| min\_dwell\_time | DATETIME |
+
+On this basis, we can create a ROLLUP table:
+
+| ColumnName | Type |
+| ---------------- | ------------ |
+| age | INT |
+| user\_id | BIGINT |
+| message | VARCHAR(100) |
+| max\_dwell\_time | DATETIME |
+| min\_dwell\_time | DATETIME |
+
+As you can see, the columns of ROLLUP and Base tables are exactly the same,
just changing the order of user_id and age. So when we do the following query:
Review Comment:
```suggestion
As you can see, the columns of ROLLUP and Base tables are exactly the same,
just changing the order of user\_id and age. So when we do the following query:
```
##########
new-docs/en/data-table/hit-the-rollup.md:
##########
@@ -26,20 +26,164 @@ under the License.
# Rollup and query
+ROLLUP in multidimensional analysis means "scroll up", which means that data
is aggregated further at a specified granularity.
+
+### Basic concepts
+
+In Doris, we make the table created by the user through the table building
statement a Base table. Base table holds the basic data stored in the way
specified by the user's table-building statement.
+
+On top of the Base table, we can create any number of ROLLUP tables. These
ROLLUP data are generated based on the Base table and physically **stored
independently**.
+
+The basic function of ROLLUP tables is to obtain coarser aggregated data on
the basis of Base tables.
+
+Let's illustrate the ROLLUP tables and their roles in different data models
with examples.
+
+#### ROLLUP in Aggregate Model and Uniq Model
+
+Because Uniq is only a special case of the Aggregate model, we do not
distinguish it here.
+
+Example 1: Get the total consumption per user
+
+Following [Data Model Aggregate Model](data-model.html#Aggregate Model) in the
**Aggregate Model** section, the Base table structure is as follows:
+
+| ColumnName | Type | AggregationType | Comment
|
+| --------------- | ------------ | --------------- |
-------------------------------------- |
+| user_id | LARGEINT | | user id
|
+| date | DATE | | date of data filling
|
+| Time stamp | DATETIME | | Data filling time,
accurate to seconds |
+| City | VARCHAR (20) | | User City
|
+| age | SMALLINT | | User age
|
+| sex | TINYINT | | User gender
|
+| Last_visit_date | DATETIME | REPLACE | Last user access time
|
+| Cost | BIGINT | SUM | Total User Consumption
|
+| max dwell time | INT | MAX | Maximum user residence
time |
+| min dwell time | INT | MIN | User minimum residence
time |
+
+The data stored are as follows:
+
+| user_id | date | timestamp | city | age | sex |
last\_visit\_date | cost | max\_dwell\_time | min\_dwell\_time |
+| ------- | ---------- | ------------------- | --------- | ---- | ---- |
------------------- | ---- | ---------------- | ---------------- |
+| 10000 | 2017-10-01 | 2017-10-01 08:00:05 | Beijing | 20 | 0 |
2017-10-01 06:00 | 20 | 10 | 10 |
+| 10000 | 2017-10-01 | 2017-10-01 09:00:05 | Beijing | 20 | 0 |
2017-10-01 07:00 | 15 | 2 | 2 |
+| 10001 | 2017-10-01 | 2017-10-01 18:12:10 | Beijing | 30 | 1 |
2017-10-01 17:05:45 | 2 | 22 | 22 |
+| 10002 | 2017-10-02 | 2017-10-02 13:10:00 | Shanghai | 20 | 1 |
2017-10-02 12:59:12 | 200 | 5 | 5 |
+| 10003 | 2017-10-02 | 2017-10-02 13:15:00 | Guangzhou | 32 | 0 |
2017-10-02 11:20:00 | 30 | 11 | 11 |
+| 10004 | 2017-10-01 | 2017-10-01 12:12:48 | Shenzhen | 35 | 0 |
2017-10-01 10:00:15 | 100 | 3 | 3 |
+| 10004 | 2017-10-03 | 2017-10-03 12:38:20 | Shenzhen | 35 | 0 |
2017-10-03 10:20:22 | 11 | 6 | 6 |
+
+On this basis, we create a ROLLUP:
+
+| ColumnName |
+| ---------- |
+| user_id |
+| cost |
+
+The ROLLUP contains only two columns: user_id and cost. After the creation,
the data stored in the ROLLUP is as follows:
+
+| user\_id | cost |
+| -------- | ---- |
+| 10000 | 35 |
+| 10001 | 2 |
+| 10002 | 200 |
+| 10003 | 30 |
+| 10004 | 111 |
+
+As you can see, ROLLUP retains only the results of SUM on the cost column for
each user_id. So when we do the following query:
Review Comment:
```suggestion
As you can see, ROLLUP retains only the results of SUM on the cost column
for each user\_id. So when we do the following query:
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]