xy720 opened a new issue #6887:
URL: https://github.com/apache/incubator-doris/issues/6887


   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Version
   
   version 0.15
   
   ### What's Wrong?
   
   In some case, the number of versions may exceed the limit during data 
synchronization. 
   
   For example, if user inserts a line of data in the upstream database of 
canal in every seconds, so that Doris will get an empty batch and a batch with 
one line of data within every seconds.
   
   In our current implementation, if Doris get an empty batch, it will send and 
commit all local remain data to backend immediately. This leads to Doris 
committing a transaction every second, which is easy to make the number of 
versions grow too fast to reach the threshold.
   
   So we must add a restriction to prevent Doris from committing transactions 
too frequently.
   
   ### What You Expected?
   
   Prevent Doris from committing transactions too frequently.
   
   ### How to Reproduce?
   
   1. create a table and create a SyncJob.
   
   ```
   CREATE TABLE `dbd` (
     `k1` int(11) NULL COMMENT "",
     `k2` int(11) NULL COMMENT "",
     `k3` tinyint(4) NULL COMMENT "",
     `k4` smallint(6) NULL COMMENT "",
     `k5` bigint(20) NULL COMMENT "",
     `k6` decimal(9, 3) NULL COMMENT "",
     `k7` char(5) NULL COMMENT "",
     `k8` varchar(20) NULL COMMENT "",
     `k9` double NULL COMMENT "",
     `k10` float NULL COMMENT "",
     `k11` date NULL COMMENT "",
     `k12` datetime NULL COMMENT ""
   ) UNIQUE KEY(`k1`)
   DISTRIBUTED BY HASH(`k1`) BUCKETS 8
   ```
   
   ```
   create sync canal_test.job3 
   ( 
       from canal_test.dbd into dbd
   ) 
   from binlog 
   (
       "type" = "canal",
       "canal.server.ip" = "127.0.0.1",
       "canal.server.port" = "11111",
       "canal.destination" = "example",
       "canal.username" = "",
       "canal.password" = ""
   );
   ```
   
   2. Insert on line of data into upsteam database in every seconds
   
   ```
   #shell example:
   #!/bin/bash
   
   MYSQL_IP=127.0.0.1
   MYSQL_PORT=3306
   MYSQL_USER=root
   MYSQL_PASS=mlxtqbd
   
   while (true)
   do
      mysql -h${MYSQL_IP} -P${MYSQL_PORT} -u${MYSQL_USER} -p${MYSQL_PASS} -e 
"insert into canal_test.dbd values (null, 1000, 10, 100, 100000, 0.14, 'test', 
'test', 0.189, 0.234, '2021-10-19', now());"
      mysql -h${MYSQL_IP} -P${MYSQL_PORT} -u${MYSQL_USER} -p${MYSQL_PASS} -e 
"insert into canal_test.dbd values (null, 1000, 10, 100, 100000, 0.14, 'test', 
'test', 0.189, 0.234, '2021-10-19', now());"
      usleep 500000
   done
   ```
   
   3. You can see the version count of table is continuously growing:
   
![a5e0a2ba01ec4935144253fe0a364af7](https://user-images.githubusercontent.com/22125576/137917462-32cc0336-1459-43f6-8b98-b710b0f516f5.png)
   
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to