This is an automated email from the ASF dual-hosted git repository.
yiguolei pushed a commit to branch branch-2.1
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/branch-2.1 by this push:
new 6eba030897d [fix](chore) path gc should consider tablet migration
(#30095) (#30548)
6eba030897d is described below
commit 6eba030897dc3a7e7ca5f4f0b24c24e88c6fe503
Author: zhannngchen <[email protected]>
AuthorDate: Tue Jan 30 12:03:21 2024 +0800
[fix](chore) path gc should consider tablet migration (#30095) (#30548)
Background:
Migration will create new tablet in different DataDir, the old tablet will
be moved to TabletManager::_shutdown_tablets.
The migration task won't copy data in stale rowsets to new tablet, so after
migration, the new tablet don't contains stale rowsets of old tablet
The path GC process will check every path, to make sure if it's an useless
tablet, or an useless rowset. If it is, will remove data of these
tablets/rowsets
The issue:
When path GC got a stale rowset path from the data dir of old tablet, it
extract the tablet id and rowset id
Then it check if the tablet id exists in TabletManager, and the answer is
YES!
It got the tablet instance, which is the new tablet, then it check if the
stale rowset id from the old tablet path exists in the new tablet instance, and
got the answer NO.
The path GC process treat the rowset as an useless rowset, since it can't
find anyone holds reference to it, then delete the data of this stale rowset.
But some query may still holds reference to this stale rowset, the deletion
will cause query failure.
Solution:
The lifecycle of all rowsets in a shutdown tablet, should be related with
the lifecycle of this tablet
We need to differentiate the old tablet and the new one created by
migration task, while performing path GC.
---
be/src/olap/data_dir.cpp | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/be/src/olap/data_dir.cpp b/be/src/olap/data_dir.cpp
index 24dc88169b8..351e7fd992e 100644
--- a/be/src/olap/data_dir.cpp
+++ b/be/src/olap/data_dir.cpp
@@ -694,7 +694,14 @@ void
DataDir::_perform_path_gc_by_tablet(std::vector<std::string>& tablet_paths)
std::swap(*forward, *backward);
continue;
}
- if (auto tablet = _tablet_manager->get_tablet(tablet_id); !tablet) {
+ auto tablet = _tablet_manager->get_tablet(tablet_id);
+ if (!tablet || tablet->data_dir() != this) {
+ if (tablet) {
+ LOG(INFO) << "The tablet in path " << path
+ << " is not same with the running one: " <<
tablet->data_dir()->_path
+ << "/" << tablet->tablet_path()
+ << ", might be the old tablet after migration, try
to move it to trash";
+ }
_tablet_manager->try_delete_unused_tablet_path(this, tablet_id,
schema_hash, path);
--backward;
std::swap(*forward, *backward);
@@ -740,6 +747,12 @@ void DataDir::_perform_path_gc_by_rowset(const
std::vector<std::string>& tablet_
continue;
}
+ if (tablet->data_dir() != this) {
+ // Current running tablet is not in same data_dir, maybe it's a
tablet after migration,
+ // will be reclaimed in the next time `_perform_path_gc_by_tablet`
+ continue;
+ }
+
bool exists;
std::vector<io::FileInfo> files;
auto st = io::global_local_filesystem()->list(path, true, &files,
&exists);
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]