KEN-LJQ opened a new issue, #13409: URL: https://github.com/apache/dolphinscheduler/issues/13409
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues. ### What happened When a task finish very soon, like within 1 seconds, may stuck running status forever, although it had finish successfully. while digging into code and log, we found that, when worker receive task, it will send two `ack` response first, and send `finish` response while task finish. But when task finish very fast, those 3 response may receive by master at the same time.  So, master may handle `finish` repsonse before `ack` response, which lead to wrong task status, and we can see it from db binlog 1. insert a record while submit ``` #230105 6:31:41 server id 13 end_log_pos 476308895 CRC32 0xcd211e98 Query thread_id=441009 exec_time=0 error_code=0 SET TIMESTAMP=1672871501/*!*/; INSERT INTO t_ds_task_instance ( dry_run, flag, environment_code, pid, task_params, task_type, task_instance_priority, task_code, worker_group, state, process_instance_id, executor_id, alert_flag, first_submit_time, max_retry_times, retry_times, submit_time, name, task_definition_version, delay_time, retry_interval ) VALUES ( 0, 1, 0, 0, '{"type":"HIVE","datasource":5,"sql":"CREATE TABLE if not exists `tmp.tmp_table_bss_consult_detail`(\\n`aid` int, \\n`consult_foreign_id` string, \\n`consult_durat ion_second` int, \\n`consult_start_time` int, \\n`consult_end_time` int, \\n`consult_from` int, \\n`consult_vip_level` int, \\n`consult_evaluation` int, \\n`consul t_evaluation_type` int)\\nPARTITIONED BY (`dt` string)\\nSTORED AS parquet","udfs":"","sqlType":"1","title":"","receivers":"","receiversCc":"","showType":"","local Params":[],"connParams":"","preStatements":[],"postStatements":[],"conditionResult":"null","dependence":"null","switchResult":"null","waitStartTimeout":null}', 'SQL', 2, 7484249458146, 'default', 0, 19434, 3, 0, '2023-01-05 06:31:41.514', 3, 0, '2023-01-05 06:31:41.514', 'test', 2, 0, 1 ) /*!*/; ``` 2. dispatch to worker ``` #230105 6:31:41 server id 13 end_log_pos 476309341 CRC32 0x9c9c2283 Query thread_id=441008 exec_time=0 error_code=0 SET TIMESTAMP=1672871501/*!*/; update t_ds_task_instance set host = '172.16.1.151:1235', submit_time = '2023-01-05 06:31:41.527' where id = 69502 /*!*/; ``` 4. update state to `running` ``` # at 476320841 #230105 6:31:42 server id 13 end_log_pos 476322293 CRC32 0x439e70b4 Query thread_id=441009 exec_time=0 error_code=0 SET TIMESTAMP=1672871502/*!*/; UPDATE t_ds_task_instance SET dry_run=0, flag=1, environment_code=0, pid=0, task_params='{"type":"HIVE","datasource":5,"sql":"CREATE TABLE if not exists `tmp.tmp_table_bss_consult_detail`(\\n`aid` int, \\n`consult_foreign_id` string, \\n`c onsult_duration_second` int, \\n`consult_start_time` int, \\n`consult_end_time` int, \\n`consult_from` int, \\n`consult_vip_level` int, \\n`consult_evaluation` int , \\n`consult_evaluation_type` int)\\nPARTITIONED BY (`dt` string)\\nSTORED AS parquet","udfs":"","sqlType":"1","title":"","receivers":"","receiversCc":"","showTyp e":"","localParams":[],"connParams":"","preStatements":[],"postStatements":[],"conditionResult":"null","dependence":"null","switchResult":"null","waitStartTimeout" :null}', task_type='SQL', task_instance_priority=2, task_code=7484249458146, log_path='/var/log/dolphinscheduler2/7484249458144_6/19434/69502.log', host='172.16.1.151:1235', start_time='2023-01-05 06:31:41', worker_group='default', state=1, process_instance_id=19434, executor_id=3, alert_flag=0, execute_path='/tmp/dolphinscheduler2/exec/process/7484249457760/7484249458144_6/19434/69502', first_submit_time='2023-01-05 06:31:42', max_retry_times=3, retry_times=0, submit_time='2023-01-05 06:31:42', name='test', task_definition_version=2, delay_time=0, retry_interval=1 WHERE id=69502 /*!*/; ``` 5. update state to `success` ``` #230105 6:31:42 server id 13 end_log_pos 476325363 CRC32 0x045eddfc Query thread_id=441009 exec_time=0 error_code=0 SET TIMESTAMP=1672871502/*!*/; UPDATE t_ds_task_instance SET dry_run=0, flag=1, environment_code=0, pid=0, task_params='{"type":"HIVE","datasource":5,"sql":"CREATE TABLE if not exists `tmp.tmp_table_bss_consult_detail`(\\n`aid` int, \\n`consult_foreign_id` string, \\n`c onsult_duration_second` int, \\n`consult_start_time` int, \\n`consult_end_time` int, \\n`consult_from` int, \\n`consult_vip_level` int, \\n`consult_evaluation` int , \\n`consult_evaluation_type` int)\\nPARTITIONED BY (`dt` string)\\nSTORED AS parquet","udfs":"","sqlType":"1","title":"","receivers":"","receiversCc":"","showTyp e":"","localParams":[],"connParams":"","preStatements":[],"postStatements":[],"conditionResult":"null","dependence":"null","switchResult":"null","waitStartTimeout" :null}', task_type='SQL', task_instance_priority=2, task_code=7484249458146, log_path='/var/log/dolphinscheduler2/7484249458144_6/19434/69502.log', host='172.16.1.151:1235', start_time='2023-01-05 06:31:41', worker_group='default', state=7, process_instance_id=19434, executor_id=3, alert_flag=0, execute_path='/tmp/dolphinscheduler2/exec/process/7484249457760/7484249458144_6/19434/69502', first_submit_time='2023-01-05 06:31:42', max_retry_times=3, retry_times=0, var_pool='[]', submit_time='2023-01-05 06:31:42', name='test', task_definition_version=2, delay_time=0, retry_interval=1, end_time='2023-01-05 06:31:42' WHERE id=69502 /*!*/; ``` 6. update state back to `running` ``` #230105 6:31:42 server id 13 end_log_pos 476326921 CRC32 0x0207b2c5 Query thread_id=441008 exec_time=0 error_code=0 SET TIMESTAMP=1672871502/*!*/; UPDATE t_ds_task_instance SET dry_run=0, flag=1, environment_code=0, pid=0, task_params='{"type":"HIVE","datasource":5,"sql":"CREATE TABLE if not exists `tmp.tmp_table_bss_consult_detail`(\\n`aid` int, \\n`consult_foreign_id` string, \\n`c onsult_duration_second` int, \\n`consult_start_time` int, \\n`consult_end_time` int, \\n`consult_from` int, \\n`consult_vip_level` int, \\n`consult_evaluation` int , \\n`consult_evaluation_type` int)\\nPARTITIONED BY (`dt` string)\\nSTORED AS parquet","udfs":"","sqlType":"1","title":"","receivers":"","receiversCc":"","showTyp e":"","localParams":[],"connParams":"","preStatements":[],"postStatements":[],"conditionResult":"null","dependence":"null","switchResult":"null","waitStartTimeout" :null}', task_type='SQL', task_instance_priority=2, task_code=7484249458146, log_path='/var/log/dolphinscheduler2/7484249458144_6/19434/69502.log', host='172.16.1.151:1235', start_time='2023-01-05 06:31:41', worker_group='default', state=1, process_instance_id=19434, executor_id=3, alert_flag=0, first_submit_time='2023-01-05 06:31:42', max_retry_times=3, retry_times=0, submit_time='2023-01-05 06:31:42', name='test', task_definition_version=2, delay_time=0, retry_interval=1 WHERE id=69502 ``` ### What you expected to happen The final state of task should be `success`, which state=7 ### How to reproduce Kind of hard to reproduce, we can try create a sql task with short run statment like `create table` , and create a branch merge like belove  then return this process multiple times, untill task node stuck at `running` state forever ### Anything else _No response_ ### Version 2.0.x ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
