empiredan opened a new issue, #1207: URL: https://github.com/apache/incubator-pegasus/issues/1207
Both [Test Release (dsn.replica.test)](https://github.com/apache/incubator-pegasus/actions/runs/3326102384/jobs/5505382914#logs) and [Test ASAN (dsn.replica.test)](https://github.com/apache/incubator-pegasus/actions/runs/3326102384/jobs/5505387446#logs) for dsn.replica.test have failed in CI workflows of https://github.com/apache/incubator-pegasus/pull/1205, with reported errors as follows: ``` [ RUN ] mutation_log_test.reset_from E2022-10-26 11:56:44.166 (1666785404166907303 282) replica.default0.0000010100010001: replica.cpp:569:init_disk_tag(): [1.1@] get disk tag of ./test-log failed: ERR_OBJECT_NOT_FOUND, init it to empty W2022-10-26 11:56:44.172 (1666785404172717263 282) replica.default0.0000010100010001: filesystem.cpp:381:rename_path(): rename from './test-log.1666785404172467160' to './test-log' failed, err = Directory not empty F2022-10-26 11:56:44.172 (1666785404172723363 282) replica.default0.0000010100010001: mutation_log.cpp:970:operator()(): assertion expression: false F2022-10-26 11:56:44.172 (1666785404172728763 282) replica.default0.0000010100010001: mutation_log.cpp:970:operator()(): rollback ./test-log.1666785404172[467](https://github.com/apache/incubator-pegasus/actions/runs/3326102384/jobs/5505382914#step:7:468)160 to ./test-log failed Aborted (core dumped) Error: Process completed with exit code 134. ``` Review the code of `mutation_log::reset_from` with the reported errors, some problems could be found. Firstly, the reason for core dump is that `err` in capture list of the lambda function for `dsn::defer` is passed as value rather than a reference. As a result, `err` for lambda function will always be `ERR_FILE_OPERATION_FAILED`; also, `utils::filesystem::rename_path(temp_dir, _dir)` will be called mistakenly. See the following code for details: ```c++ // define `defer` for rollback temp_dir when failed or remove temp_dir when success auto temp_dir_resolve = dsn::defer([this, err, temp_dir]() { if (err != ERR_OK) { if (!utils::filesystem::rename_path(temp_dir, _dir)) { // rollback failed means old log files are not be recovered, it may be lost if only // LOG_ERROR, dassert for manual resolve it dassert_f("rollback {} to {} failed", temp_dir, _dir); } } else { if (!dsn::utils::filesystem::remove_path(temp_dir)) { // temp dir allow delete failed, it's only garbage LOG_ERROR_F("remove temp dir {} failed", temp_dir); } } }); ``` Secondly, from the above code, another error can be found for `dassert_f("rollback {} to {} failed", temp_dir, _dir);`. The first parameter for `dassert_f` should have been a condition; however, "`rollback {} to {} failed`" is passed as the first parameter by mistake rather than the condition. Therefore, this assertion will never failed since this string will always be converted implicitly as true. Thirdly, the following code also renames a directory, however there is not any rollback process for this: ```c++ // move source dir to target dir if (!utils::filesystem::rename_path(dir, _dir)) { LOG_ERROR_F("rename {} to {} failed", dir, _dir); return err; } LOG_INFO_F("move {} to {} as our new log directory", dir, _dir); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
