(doris-website) branch master updated: [improvement] imrovement doc of auto inc column (#1510)

dataroaring Tue, 24 Dec 2024 01:21:13 -0800

This is an automated email from the ASF dual-hosted git repository.

dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 6abea9f0c99 [improvement] imrovement doc of auto inc column (#1510)
6abea9f0c99 is described below

commit 6abea9f0c9910dbb44bc765037dbce071c9faa1a
Author: Yongqiang YANG <[email protected]>
AuthorDate: Tue Dec 24 17:19:18 2024 +0800

    [improvement] imrovement doc of auto inc column (#1510)
    
    ## Versions
    
    - [ ] dev
    - [ ] 3.0
    - [ ] 2.1
    - [ ] 2.0
    
    ## Languages
    
    - [ ] Chinese
    - [ ] English
    
    ## Docs Checklist
    
    - [ ] Checked by AI
    - [ ] Test Cases Built
    
    ---------
    
    Co-authored-by: Yongqiang YANG <[email protected]>
    Co-authored-by: bobhan1 <[email protected]>
---
 docs/table-design/auto-increment.md | 104 +++++++++++++++++++++---------------
 1 file changed, 61 insertions(+), 43 deletions(-)

diff --git a/docs/table-design/auto-increment.md 
b/docs/table-design/auto-increment.md
index 1e1eae38451..487c47b89f0 100644
--- a/docs/table-design/auto-increment.md
+++ b/docs/table-design/auto-increment.md
@@ -1,6 +1,6 @@
 ---
 {
-    "title": "Using AUTO_INCREMENT",
+    "title": "Auto-Increment Column",
     "language": "en"
 }
 ---
@@ -24,22 +24,42 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+When importing data, Doris automatically assigns unique values to rows that do 
not have specified values in the **auto-increment column**. This feature 
simplifies data import workflows while maintaining flexibility.
 
-When importing data, Doris assigns a table-unique value to rows that do not 
have specified values in the auto-increment column.
+---
 
 ## Functionality
 
-For tables containing an auto-increment column, during data import:
-- If the target columns don't include the auto-increment column, Doris will 
populate the auto-increment column with generated values.
-- If the target columns include the auto-increment column, null values in the 
imported data for that column will be replaced by values generated by Doris, 
while non-null values will remain unchanged. Note that **non-null values can 
disrupt the uniqueness of the auto-increment column values**.
+For tables with an auto-increment column, Doris processes data imports as 
follows:
+
+- **Auto-Population (Column Excluded)**:
+  If the imported data does not include the auto-increment column, Doris 
generates and populates unique values for this column.
+
+- **Partial Specification (Column Included)**:
+  - **Null Values**: Doris replaces null values in the imported data with 
system-generated unique values.
+  - **Non-Null Values**: User-provided values remain unchanged.
+  > **Important**: User-provided non-null values can disrupt the uniqueness of 
the auto-increment column.
+
+---
 
 ### Uniqueness
 
-Doris ensures that values generated on the auto-increment column have 
**table-wide uniqueness**. However, it's important to note that **the 
uniqueness of the auto-increment column only guarantees uniqueness for values 
automatically filled by Doris and does not consider values provided by users**. 
If a user explicitly inserts user-provided values for this table by specifying 
the auto-increment column, this uniqueness cannot be guaranteed.
+Doris guarantees **table-wide uniqueness** for values it generates in the 
auto-increment column. However:
+
+- **Guaranteed Uniqueness**: This applies only to system-generated values.
+- **User-Provided Values**: Doris does not validate or enforce uniqueness for 
values specified by users in the auto-increment column. This may result in 
duplicate entries.
+
+---
 
 ### Density
 
-Doris ensures that the values generated on the auto-increment column are 
dense, but it **cannot guarantee** that the values automatically generated in 
the auto-increment column during an import will be entirely contiguous. Thus, 
there might be some jumps in the values generated by the auto-increment column 
during an import. This is because, for performance consideration, each BE 
caches a portion of pre-allocated auto-increment column values, and these 
cached values do not intersect betwe [...]
+Auto-increment values generated by Doris are generally **dense** but with some 
considerations:
+
+- **Potential Gaps**: Gaps may appear due to performance optimizations. Each 
backend node (BE) pre-allocates a block of unique values for efficiency, and 
these blocks do not overlap between nodes.
+- **Non-Chronological Values**: Doris does not guarantee that values generated 
in later imports are larger than those from earlier imports.
+  > **Note**: Auto-increment values cannot be used to infer the chronological 
order of imports.
+
+---
 
 ## Syntax
 
@@ -47,7 +67,7 @@ To use auto-increment columns, you need to add the 
`AUTO_INCREMENT` attribute to
 
 ### Examples
 
-1. Creating a Duplicate table with one key column as an auto-increment column:
+1. Creating a duplicate table with an auto-increment column as the key column.
 
   ```sql
   CREATE TABLE `demo`.`tbl` (
@@ -60,7 +80,7 @@ To use auto-increment columns, you need to add the 
`AUTO_INCREMENT` attribute to
   "replication_allocation" = "tag.location.default: 3"
   );
 
-2. Creating a Duplicate table with one key column as an auto-increment column, 
and set start value is 100:
+2. Creating a duplicate table with an auto-increment column as the key column, 
and setting the starting value to 100.
 
   ```sql
   CREATE TABLE `demo`.`tbl` (
@@ -74,7 +94,7 @@ To use auto-increment columns, you need to add the 
`AUTO_INCREMENT` attribute to
   );
   ```
 
-3. Creating a Duplicate table with one value column as an auto-increment 
column:
+3. Creating a duplicate table with an auto-increment column as one of the 
value columns.
 
   ```sql
   CREATE TABLE `demo`.`tbl` (
@@ -90,7 +110,7 @@ To use auto-increment columns, you need to add the 
`AUTO_INCREMENT` attribute to
   );
   ```
 
-4. Creating a Unique tbl table with one key column as an auto-increment column:
+4. Creating a unique table with an auto-increment column as the key column.
 
   ```sql
   CREATE TABLE `demo`.`tbl` (
@@ -106,7 +126,7 @@ To use auto-increment columns, you need to add the 
`AUTO_INCREMENT` attribute to
   );
   ```
 
-5. Creating a Unique tbl table with one value column as an auto-increment 
column:
+5. Creating a unique table with an auto-increment column as one of the value 
columns.
 
   ```sql
   CREATE TABLE `demo`.`tbl` (
@@ -123,16 +143,16 @@ To use auto-increment columns, you need to add the 
`AUTO_INCREMENT` attribute to
 
 ### Constraints and Limitations
 
-- Only Duplicate model tables and Unique model tables can contain 
auto-increment columns.
-- A table can contain at most one auto-increment column.
-- The type of the auto-increment column must be BIGINT and must be NOT NULL.
-- The manually specified starting value for an auto-increment column must be 
greater than or equal to 0.
+- Auto-increment columns can only be used in Duplicate or Unique model tables.
+- A table can have only one auto-increment column.
+- The auto-increment column must be of type `BIGINT` and cannot be `NULL`.
+- The manually specified starting value for an auto-increment column must be 0 
or greater.
 
 ## Usage
 
 ### Import
 
-Consider the following table:
+Consider the table below:
 
 ```sql
 CREATE TABLE `demo`.`tbl` (
@@ -148,7 +168,7 @@ PROPERTIES (
 );
 ```
 
-When using the insert into statement to import data without specifying the 
auto-increment column `id`, the `id` column will automatically be filled with 
generated values.
+When using the insert into statement to import data without including the 
auto-increment column `id`,  Doris automatically generates and fills unique 
values for the column.
 
 ```sql
 mysql> insert into tbl(name, value) values("Bob", 10), ("Alice", 20), ("Jack", 
30);
@@ -166,7 +186,7 @@ mysql> select * from tbl order by id;
 3 rows in set (0.05 sec)
 ```
 
-Similarly, using stream load to import the file test.csv without specifying 
the auto-increment column `id` will result in the id column being automatically 
filled with generated values.
+Similarly, when using stream load to import the file `test.csv` without 
specifying the auto-increment column `id`, Doris will automatically populate 
the `id` column with generated values.
 
 test.csv:
 ```
@@ -191,8 +211,7 @@ mysql> select * from tbl order by id;
 +------+-------+-------+
 5 rows in set (0.04 sec)
 ```
-
-When importing using insert into statement while specifying the auto-increment 
column `id`, null values in the imported data for that column will be replaced 
by generated values.
+When importing data using the `INSERT INTO` statement and specifying the 
auto-increment column `id`, any null values in the imported data for that 
column will be replaced with generated values.
 
 ```sql
 mysql> insert into tbl(id, name, value) values(null, "Doris", 60), (null, 
"Nereids", 70);
@@ -216,9 +235,9 @@ mysql> select * from tbl order by id;
 
 ### Partial Update
 
-When performing a partial update on a merge-on-write Unique table containing 
an auto-increment column:
+When performing a partial update on a merge-on-write Unique table with an 
auto-increment column:
 
-If the auto-increment column is a key column, during partial updates, as users 
must explicitly specify the key column, the target columns for partial column 
updates must include the auto-increment column. In this scenario, the import 
behavior is similar to regular partial updates.
+If the auto-increment column is a key column, users must explicitly specify it 
during partial updates. As a result, the target columns for partial updates 
must include the auto-increment column. In this case, the import behavior 
aligns with that of standard partial updates.
 
 ```sql
 mysql> CREATE TABLE `demo`.`tbl2` (
@@ -270,7 +289,7 @@ mysql> select * from tbl2 order by id;
 4 rows in set (0.04 sec)
 ```
 
-When the auto-increment column is a non-key column and users haven't specified 
the value for the auto-increment column, the value will be filled from existing 
data rows in the table. If users specify the auto-increment column, null values 
in the imported data for that column will be replaced by generated values, 
while non-null values will remain unchanged, and then these data will be loaded 
with the semantics of partial updates.
+When the auto-increment column is a non-key column and no value is provided, 
its value will be derived from existing rows in the table. If a value is 
specified for the auto-increment column, null values in the imported data will 
be replaced with generated values, while non-null values will remain unchanged. 
These records will then be processed according to the semantics of partial 
updates.
 
 ```sql
 mysql> CREATE TABLE `demo`.`tbl3` (
@@ -340,9 +359,10 @@ mysql> select * from tbl3 order by id;
 
 ### Dictionary Encoding
 
-Using bitmaps for audience analysis in user profile requires building a user 
dictionary where each user corresponds to a unique integer dictionary value. 
Aggregating these dictionary values can improve the performance of bitmap.
+Using bitmaps for audience analysis in user profiling involves creating a user 
dictionary, where each user is assigned a unique integer as their dictionary 
value. Aggregating these dictionary values can improve the performance of 
bitmap operations.
+
+For example, in an offline UV (Unique Visitors) and PV (Page Views) analysis 
scenario, consider a detailed user behavior table:
 
-Taking the offline UV and PV analysis scenario as an example, assuming there's 
a detailed user behavior table:
 
 ```sql
 CREATE TABLE `demo`.`dwd_dup_tbl` (
@@ -361,8 +381,7 @@ PROPERTIES (
 );
 ```
 
-Using the auto-incrementa column to create the following dictionary table:
-
+Using the auto-increment column to create the following dictionary table:
 
 ```sql
 CREATE TABLE `demo`.`dictionary_tbl` (
@@ -377,15 +396,14 @@ PROPERTIES (
 );
 ```
 
-Import the value of `user_id` from existing data into the dictionary table, 
establishing the mapping of `user_id` to integer values:
+Import the `user_id` values from existing data into the dictionary table to 
map `user_id` to corresponding integer values:
 
 ```sql
 insert into dictionary_tbl(user_id)
 select user_id from dwd_dup_tbl group by user_id;
 ```
 
-Or import only the value of `user_id` in incrementa data into the dictionary 
table alternatively:
-
+Alternatively, import only the `user_id` values from incremental data into the 
dictionary table.
 
 ```sql
 insert into dictionary_tbl(user_id)
@@ -393,9 +411,9 @@ select dwd_dup_tbl.user_id from dwd_dup_tbl left join 
dictionary_tbl
 on dwd_dup_tbl.user_id = dictionary_tbl.user_id where dwd_dup_tbl.visit_time > 
'2023-12-10' and dictionary_tbl.user_id is NULL;
 ```
 
-In real-world scenarios, Flink connectors can also be employed to write data 
into Doris.
+In practical applications, Flink connectors can be used to write data into 
Doris.
 
-Assuming `dim1`, `dim3`, `dim5` represent statistical dimensions of interest 
to us, create the following table to store aggregated results:
+To store aggregated results for the statistical dimensions `dim1`, `dim3`, and 
`dim5`, create the following table:
 
 ```sql
 CREATE TABLE `demo`.`dws_agg_tbl` (
@@ -412,7 +430,7 @@ PROPERTIES (
 );
 ```
 
-Store the result of the data aggregation operations into the aggregation 
result table:
+Save the aggregated data into the results table.
 
 ```sql
 insert into dws_agg_tbl
@@ -420,7 +438,7 @@ select dwd_dup_tbl.dim1, dwd_dup_tbl.dim3, 
dwd_dup_tbl.dim5, BITMAP_UNION(TO_BIT
 from dwd_dup_tbl INNER JOIN dictionary_tbl on dwd_dup_tbl.user_id = 
dictionary_tbl.user_id;
 ```
 
-Perform UV and PV queries using the following statement:
+Execute UV and PV queries with the following statement:
 
 ```sql
 select dim1, dim3, dim5, user_id_bitmap as uv, pv from dws_agg_tbl;
@@ -428,7 +446,7 @@ select dim1, dim3, dim5, user_id_bitmap as uv, pv from 
dws_agg_tbl;
 
 ### Efficient Pagination
 
-When displaying data on a page, pagination is often necessary. Traditional 
pagination typically involves using `limit`, `offset`, and `order by` in SQL 
queries. For instance, consider the following business table intended for 
display:
+Pagination is often required when displaying data on a page. Traditional 
pagination usually involves using `LIMIT`, `OFFSET`, and `ORDER BY` in SQL 
queries. For example, consider the following business table designed for 
display:
 
 ```sql
 CREATE TABLE `demo`.`records_tbl` (
@@ -447,21 +465,21 @@ PROPERTIES (
 );
 ```
 
-Assuming 100 records are displayed per page in pagination. To fetch the first 
page's data, the following SQL query can be used:
+Assuming 100 records are displayed per page, the following SQL query can be 
used to fetch data for the first page:
 
 ```sql
 select * from records_tbl order by `key`, `name` limit 100;
 ```
 
-Fetching the data for the second page can be accomplished by:
+To fetch data for the second page, you can use the following query:
 
 ```sql
 select * from records_tbl order by `key`, `name` limit 100 offset 100;
 ```
 
-However, when performing deep pagination queries (with large offsets), even if 
the actual required data rows are few, this method still reads all data into 
memory for full sorting before subsequent processing, which is quite 
inefficient. Using an auto-incrementa column assigns a unique value to each 
row, allowing the use of where `unique_value` > x limit y to filter a 
significant amount of data beforehand, making pagination more efficient.
+However, when performing deep pagination queries (with large offsets), this 
method can be inefficient, as it reads all data into memory for sorting before 
processing, even if only a small number of rows are needed. By using an 
auto-increment column, each row is assigned a unique value, enabling the use of 
a query like `WHERE unique_value > x LIMIT y` to filter out a large portion of 
the data in advance, making pagination more efficient.
 
-Continuing with the aforementioned business table, an auto-increment column is 
added to the table to give each row a unique identifier:
+To illustrate this, an auto-increment column is added to the business table, 
giving each row a unique identifier:
 
 ```sql
 CREATE TABLE `demo`.`records_tbl2` (
@@ -481,19 +499,19 @@ PROPERTIES (
 );
 ```
 
-For pagination displaying 100 records per page, to fetch the first page's 
data, the following SQL query can be used:
+For pagination with 100 records per page, the following SQL query can be used 
to fetch the data for the first page:
 
 ```sql
 select * from records_tbl2 order by unique_value limit 100;
 ```
 
-By recording the maximum value of unique_value in the returned results, let's 
assume it's 99. The following query can then fetch data for the second page:
+By recording the maximum value of `unique_value` from the returned results, 
let's assume it is 99. The following query can then be used to fetch data for 
the second page:
 
 ```sql
 select * from records_tbl2 where unique_value > 99 order by unique_value limit 
100;
 ```
 
-If directly querying contents from a later page and it's inconvenient to 
directly obtain the maximum value of `unique_value` from the preceding page's 
data (for instance, directly obtaining contents from the 101st page), the 
following query can be used:
+If directly querying data from a later page and it's inconvenient to retrieve 
the maximum value of `unique_value` from the previous page's results (for 
example, when fetching data starting from the 101st page), the following query 
can be used:
 
 ```sql
 select key, name, address, city, nation, region, phone, mktsegment


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: [improvement] imrovement doc of auto inc column (#1510)

Reply via email to