github-actions[bot] commented on code in PR #61545:
URL: https://github.com/apache/doris/pull/61545#discussion_r3402322979
##########
cloud/src/meta-service/meta_service.cpp:
##########
@@ -1565,6 +1565,10 @@ void
MetaServiceImpl::prepare_restore_job(::google::protobuf::RpcController* con
rs_metas.assign(tablet_meta.rs_metas().begin(),
tablet_meta.rs_metas().end());
tablet_meta.clear_rs_metas(); // strip off rs meta
+ std::unique_ptr<DeleteBitmapPB> delete_bitmap =
+ std::make_unique<DeleteBitmapPB>(tablet_meta.delete_bitmap());
+ tablet_meta.clear_delete_bitmap(); // strip off delete bitmap
Review Comment:
Moving delete-bitmap persistence out of commit needs a compatibility path
for restore jobs that were already PREPARED by the old meta-service. In the old
code, prepare stored the bitmap inside `RestoreJobCloudPB.tablet_meta` and
commit wrote the KV. If meta-service rolls to this version between PREPARE and
COMMIT, this new commit path no longer writes `tablet_meta.delete_bitmap()`, so
the job can commit rowsets/tablet meta with no delete-bitmap KV and MoW deletes
become visible again. Please keep a commit-time fallback for
`restore_job_pb.tablet_meta().has_delete_bitmap()` or otherwise migrate/mark
prepared jobs before clearing/removing this path.
##########
cloud/src/meta-service/meta_service.cpp:
##########
@@ -1655,6 +1659,151 @@ void
MetaServiceImpl::prepare_restore_job(::google::protobuf::RpcController* con
}
saved_rowset_num += sub_restore_job_rs_metas.size();
}
+
+ // 3. save delete bitmap
Review Comment:
Defaulting a missing PREPARE `store_version` to 1 is unsafe during rolling
BE upgrades. Before this PR, `CloudMetaMgr::prepare_restore_job` did not set
this field; only `commit_restore_job` did. With new meta-service plus an old
BE, a cluster configured for v2 or dual delete-bitmap writes will store only v1
during prepare, and commit no longer has a chance to use the commit request's
`store_version` to write v2. Later versioned-read paths reject/read away from
v1, so restored MoW delete bitmaps can be missed. Please either reject/migrate
missing `store_version` when v2 is required, or keep a commit-time write
fallback using the commit request's version.
##########
cloud/src/meta-service/meta_service.cpp:
##########
@@ -1655,6 +1659,151 @@ void
MetaServiceImpl::prepare_restore_job(::google::protobuf::RpcController* con
}
saved_rowset_num += sub_restore_job_rs_metas.size();
}
+
+ // 3. save delete bitmap
+ auto store_version = request->has_store_version() ?
request->store_version() : 1;
+ bool write_v1 = store_version == 1 || store_version == 3;
+ bool write_v2 = store_version == 2 || store_version == 3;
+ if (write_v1 && delete_bitmap->rowset_ids_size() > 0) {
+ txn0.reset();
+ err = txn_kv_->create_txn(&txn0);
+ if (err != TxnErrorCode::TXN_OK) {
+ code = cast_as<ErrCategory::CREATE>(err);
+ msg = fmt::format("failed to create txn, err={}", err);
+ LOG(WARNING) << msg;
+ return;
+ }
+
+ for (size_t i = 0; i < delete_bitmap->rowset_ids_size(); ++i) {
+ MetaDeleteBitmapInfo key_info {instance_id, tablet_idx.tablet_id(),
+ delete_bitmap->rowset_ids(i),
delete_bitmap->versions(i),
+ delete_bitmap->segment_ids(i)};
+ std::string key;
+ meta_delete_bitmap_key(key_info, &key);
+ auto& val = delete_bitmap->segment_delete_bitmaps(i);
+
+ if (txn0->approximate_bytes() + key.size() * 3 + val.size() >
+ config::max_txn_commit_byte) {
+ err = txn0->commit();
+ if (err != TxnErrorCode::TXN_OK) {
+ code = cast_as<ErrCategory::COMMIT>(err);
+ msg = fmt::format("failed to update delete bitmap,
tablet_id={}, err={}",
+ tablet_idx.tablet_id(), err);
+ return;
+ }
+ err = txn_kv_->create_txn(&txn0);
+ if (err != TxnErrorCode::TXN_OK) {
+ code = cast_as<ErrCategory::CREATE>(err);
+ msg = "failed to init txn";
+ return;
+ }
+ }
+ txn0->put(key, val);
+ LOG_INFO("put delete bitmap key")
+ .tag("delete_bitmap_key", hex(key))
+ .tag("tablet_id", tablet_idx.tablet_id())
+ .tag("rowset_id", delete_bitmap->rowset_ids(i))
+ .tag("version", delete_bitmap->versions(i))
+ .tag("segment_id", delete_bitmap->segment_ids(i))
+ .tag("delete_bitmap_size", key.size() + val.size());
+ }
+ err = txn0->commit();
+ if (err != TxnErrorCode::TXN_OK) {
+ code = cast_as<ErrCategory::COMMIT>(err);
+ msg = fmt::format("failed to update delete bitmap, tablet_id={},
err={}",
+ tablet_idx.tablet_id(), err);
+ return;
+ }
+ }
+
+ if (write_v2 && delete_bitmap->rowset_ids_size() > 0) {
+ txn0.reset();
+ err = txn_kv_->create_txn(&txn0);
+ if (err != TxnErrorCode::TXN_OK) {
+ code = cast_as<ErrCategory::CREATE>(err);
+ msg = fmt::format("failed to create txn, err={}", err);
+ LOG(WARNING) << msg;
+ return;
+ }
+
+ std::string pre_rowset_id = "";
+ std::string cur_rowset_id = "";
+ DeleteBitmapPB delete_bitmap_pb;
+
+ auto store_delete_bitmap = [&](std::string rowset_id) {
+ DCHECK_NE(delete_bitmap_pb.rowset_ids_size(), 0);
+ std::string key;
+ versioned::MetaDeleteBitmapInfo key_info {instance_id,
tablet_idx.tablet_id(),
+ rowset_id};
+ versioned::meta_delete_bitmap_key(key_info, &key);
+ std::string val;
+ DeleteBitmapStoragePB delete_bitmap_storage;
+ delete_bitmap_storage.set_store_in_fdb(true);
+ *(delete_bitmap_storage.mutable_delete_bitmap()) =
std::move(delete_bitmap_pb);
+ if (!delete_bitmap_storage.SerializeToString(&val)) {
+ code = MetaServiceCode::PROTOBUF_SERIALIZE_ERR;
+ msg = "failed to serialize delete bitmap storage";
+ return;
+ }
+
+ if (txn0->approximate_bytes() + key.size() * 3 + val.size() >
+ config::max_txn_commit_byte) {
Review Comment:
When this threshold is hit, `txn0` already contains previously buffered v2
delete-bitmap blobs. This branch writes the current blob to that transaction,
checks the stale `err` from the earlier `create_txn`, then replaces `txn0`
without committing; all previously buffered blobs are dropped. The function
then writes only the current rowset into the new txn and can still return OK,
leaving earlier rowsets without delete bitmaps. This is reachable for a restore
with multiple v2 rowset delete bitmaps whose accumulated transaction exceeds
`max_txn_commit_byte`. Please commit the existing txn before starting a new
one, then add the current blob to the fresh txn, and add coverage that forces
this branch.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]