[incubator-doris] branch master updated: [doc]fix doc typo in data-model and date data type (#9571)

yiguolei Sat, 14 May 2022 19:18:01 -0700

This is an automated email from the ASF dual-hosted git repository.

yiguolei pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git



The following commit(s) were added to refs/heads/master by this push:
     new 1075648093 [doc]fix doc typo in data-model and date data type (#9571)
1075648093 is described below

commit 1075648093a321c099dd56f4e8a1d5b59fe90dd6
Author: dataalive <[email protected]>
AuthorDate: Sun May 15 10:17:46 2022 +0800

    [doc]fix doc typo in data-model and date data type (#9571)
---
 docs/en/data-table/data-model.md                   | 40 +++++++++++-----------
 .../en/sql-manual/sql-reference/Data-Types/DATE.md | 14 +++++---
 2 files changed, 30 insertions(+), 24 deletions(-)

diff --git a/docs/en/data-table/data-model.md b/docs/en/data-table/data-model.md
index fda9550c88..1f85c6fef1 100644
--- a/docs/en/data-table/data-model.md
+++ b/docs/en/data-table/data-model.md
@@ -136,14 +136,14 @@ Then when this batch of data is imported into Doris 
correctly, the final storage
 
 As you can see, there is only one line of aggregated data left for 10,000 
users. The data of other users are consistent with the original data. Here we 
first explain the aggregated data of user 10000:
 
-The first five columns remain unchanged, starting with column 6 
`last_visit_date':
+The first five columns remain unchanged, starting with column 6 
`last_visit_date`:
 
-*`2017-10-01 07:00`: Because the `last_visit_date`column is aggregated by 
REPLACE, the `2017-10-01 07:00` column has been replaced by `2017-10-01 06:00'.
-> Note: For data in the same import batch, the order of replacement is not 
guaranteed for the aggregation of REPLACE. For example, in this case, it may be 
`2017-10-01 06:00'. For data from different imported batches, it can be 
guaranteed that the data from the latter batch will replace the former batch.
+*`2017-10-01 07:00`: Because the `last_visit_date`column is aggregated by 
REPLACE, the `2017-10-01 07:00` column has been replaced by `2017-10-01 06:00`.
+> Note: For data in the same import batch, the order of replacement is not 
guaranteed for the aggregation of REPLACE. For example, in this case, it may be 
`2017-10-01 06:00`. For data from different imported batches, it can be 
guaranteed that the data from the latter batch will replace the former batch.
 
-*`35`: Because the aggregation type of the `cost'column is SUM, 35 is 
accumulated from 20 + 15.
-*`10`: Because the aggregation type of the`max_dwell_time'column is MAX, 10 
and 2 take the maximum and get 10.
-*`2`: Because the aggregation type of `min_dwell_time'column is MIN, 10 and 2 
take the minimum value and get 2.
+*`35`: Because the aggregation type of the `cost`column is SUM, 35 is 
accumulated from 20 + 15.
+*`10`: Because the aggregation type of the`max_dwell_time`column is MAX, 10 
and 2 take the maximum and get 10.
+*`2`: Because the aggregation type of `min_dwell_time`column is MIN, 10 and 2 
take the minimum value and get 2.
 
 After aggregation, Doris ultimately only stores aggregated data. In other 
words, detailed data will be lost and users can no longer query the detailed 
data before aggregation.
 
@@ -276,10 +276,10 @@ This table structure is exactly the same as the following 
table structure descri
 |---|---|---|---|
 | user_id | BIGINT | | user id|
 | username | VARCHAR (50) | | User nickname|
-| City | VARCHAR (20) | REPLACE | User City|
+| city | VARCHAR (20) | REPLACE | User City|
 | age | SMALLINT | REPLACE | User Age|
 | sex | TINYINT | REPLACE | User Gender|
-| Phone | LARGEINT | REPLACE | User Phone|
+| phone | LARGEINT | REPLACE | User Phone|
 | address | VARCHAR (500) | REPLACE | User Address|
 | register_time | DATETIME | REPLACE | User registration time|
 
@@ -311,12 +311,12 @@ In some multidimensional analysis scenarios, data has 
neither primary keys nor a
 
 |ColumnName|Type|SortKey|Comment|
 |---|---|---|---|
-| Timstamp | DATETIME | Yes | Logging Time|
-| Type | INT | Yes | Log Type|
-|error_code|INT|Yes|error code|
+| timstamp | DATETIME | Yes | Logging Time|
+| type | INT | Yes | Log Type|
+| error_code|INT|Yes|error code|
 | Error_msg | VARCHAR (1024) | No | Error Details|
-|op_id|BIGINT|No|operator id|
-|op_time|DATETIME|No|operation time|
+| op_id|BIGINT|No|operator id|
+| op_time|DATETIME|No|operation time|
 
 The TABLE statement is as follows:
 ```
@@ -337,9 +337,9 @@ PROPERTIES (
 ```
 
 This data model is different from Aggregate and Uniq models. Data is stored 
entirely in accordance with the data in the imported file, without any 
aggregation. Even if the two rows of data are identical, they will be retained.
-The DUPLICATE KEY specified in the table building statement is only used to 
specify which columns the underlying data is sorted according to. (The more 
appropriate name should be "Sorted Column", where the name "DUPLICATE KEY" is 
used to specify the data model used. For more explanations of "Sorted Column", 
see the section ** Prefix Index **. On the choice of DUPLICATE KEY, we 
recommend that the first 2-4 columns be selected appropriately.
+The DUPLICATE KEY specified in the table building statement is only used to 
specify which columns the underlying data is sorted according to. (The more 
appropriate name should be "Sorted Column", where the name "DUPLICATE KEY" is 
used to specify the data model used. For more explanations of "Sorted Column", 
see the section **Prefix Index**.) On the choice of DUPLICATE KEY, we recommend 
that the first 2-4 columns be selected appropriately.
 
-This data model is suitable for storing raw data without aggregation 
requirements and primary key uniqueness constraints. For more usage scenarios, 
see the ** Limitations of the Aggregation Model ** section.
+This data model is suitable for storing raw data without aggregation 
requirements and primary key uniqueness constraints. For more usage scenarios, 
see the **Limitations of the Aggregation Model** section.
 
 ## Limitations of aggregation model
 
@@ -351,9 +351,9 @@ The hypothesis table is structured as follows:
 
 |ColumnName|Type|AggregationType|Comment|
 |---|---|---|---|
-| userid | LARGEINT | | user id|
+| user\_id | LARGEINT | | user id|
 | date | DATE | | date of data filling|
-| Cost | BIGINT | SUM | Total User Consumption|
+| cost | BIGINT | SUM | Total User Consumption|
 
 Assume that there are two batches of data that have been imported into the 
storage engine as follows:
 
@@ -395,7 +395,7 @@ Let's take the most basic count (*) query as an example:
 
 `SELECT COUNT(*) FROM table;`
 
-In other databases, such queries return results quickly. Because in the 
implementation, we can get the query result by counting rows at the time of 
import and saving count statistics information, or by scanning only a column of 
data to get count value at the time of query, with very little overhead. But in 
Doris's aggregation model, the overhead of this query ** is very large **.
+In other databases, such queries return results quickly. Because in the 
implementation, we can get the query result by counting rows at the time of 
import and saving count statistics information, or by scanning only a column of 
data to get count value at the time of query, with very little overhead. But in 
Doris's aggregation model, the overhead of this query is **very large**.
 
 Let's take the data as an example.
 
@@ -423,7 +423,7 @@ Because the final aggregation result is:
 |10002|2017-11-21|39|
 |10003|2017-11-22|22|
 
-So `select count (*) from table;` The correct result should be **4**. But if 
we only scan the `user_id'column and add query aggregation, the final result is 
**3** (10001, 10002, 10003). If aggregated without queries, the result is **5** 
(a total of five rows in two batches). It can be seen that both results are 
wrong.
+So `select count (*) from table;` The correct result should be **4**. But if 
we only scan the `user_id`column and add query aggregation, the final result is 
**3** (10001, 10002, 10003). If aggregated without queries, the result is **5** 
(a total of five rows in two batches). It can be seen that both results are 
wrong.
 
 In order to get the correct result, we must read the data of `user_id` and 
`date`, and **together with aggregate** when querying, to return the correct 
result of **4**. That is to say, in the count (*) query, Doris must scan all 
AGGREGATE KEY columns (here are `user_id` and `date`) and aggregate them to get 
the semantically correct results. When aggregated columns are large, count (*) 
queries need to scan a large amount of data.
 
@@ -446,7 +446,7 @@ Duplicate model has no limitation of aggregation model. 
Because the model does n
 
 ## Suggestions for Choosing Data Model
 
-Because the data model was established when the table was built, and **could 
not be modified **. Therefore, it is very important to select an appropriate 
data model**.
+Because the data model was established when the table was built, and **could 
not be modified. Therefore, it is very important to select an appropriate data 
model**.
 
 1. Aggregate model can greatly reduce the amount of data scanned and the 
amount of query computation by pre-aggregation. It is very suitable for report 
query scenarios with fixed patterns. But this model is not very friendly for 
count (*) queries. At the same time, because the aggregation method on the 
Value column is fixed, semantic correctness should be considered in other types 
of aggregation queries.
 2. Uniq model guarantees the uniqueness of primary key for scenarios requiring 
unique primary key constraints. However, the query advantage brought by 
pre-aggregation such as ROLLUP cannot be exploited (because the essence is 
REPLACE, there is no such aggregation as SUM).
diff --git a/docs/en/sql-manual/sql-reference/Data-Types/DATE.md 
b/docs/en/sql-manual/sql-reference/Data-Types/DATE.md
index eb1e47257d..443012c23c 100644
--- a/docs/en/sql-manual/sql-reference/Data-Types/DATE.md
+++ b/docs/en/sql-manual/sql-reference/Data-Types/DATE.md
@@ -27,15 +27,21 @@ under the License.
 ## DATE
 ### Description
 DATE function
-Syntax:
+
+#### Syntax
 Date
 Convert input type to DATE type
 date
 Date type, the current range of values is ['0000-01-01','9999-12-31'], and the 
default print form is 'YYYYY-MM-DD'.
 
 ### example
-mysql> SELECT DATE('2003-12-31 01:02:03');
--> '2003-12-31'
-
+```
+SELECT DATE('2003-12-31 01:02:03');
++-----------------------------+
+| date('2003-12-31 01:02:03') |
++-----------------------------+
+| 2003-12-31                  |
++-----------------------------+
+```
 ### keywords
 DATE


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[incubator-doris] branch master updated: [doc]fix doc typo in data-model and date data type (#9571)

Reply via email to