wangningito created KUDU-3070:
---------------------------------
Summary: rewrite_raft_config could accept mutli tablet_id as input
Key: KUDU-3070
URL: https://issues.apache.org/jira/browse/KUDU-3070
Project: Kudu
Issue Type: New Feature
Components: CLI
Reporter: wangningito
I'm in a bigdata company which served over 1000+ company, we adopted kudu as
main or auxiliary storage engine, some of them are just small startups, they
had a lot of data but too much nodes are too expensive to them.
So some of cases are based on: few nodes, much data and maybe not compacted
well data.
In our scenario, there exists some migration cases which need multi tablet_id
as input.
# from standalone tserver to another standalone tserver
# from 3 nodes tserver cluster to another 3 nodes tserver
In the past, we have to do something like this
{code:java}
// code placeholder
// First, download tablet data via kudu local_replica copy_from_remote
// then rewrite all the raft info for each tablet
echo ${tablet_id_list} | xargs -i kudu local_replica cmeta rewrite_raft_config
{} PEER_INFO -fs_data_dirs=xxx -fs_wal_dir=yyy{code}
Download data via copy_from_remote is blazing fast.
However sometimes it takes us a lot of time to rewrite raft info of all tablet,
30s - 60s per tablet as I witnessed. Sometimes it could take more time if the
data were not fully compacted. So sometimes it take us 2 hours to download
tablet data, but 6 hours to rewrite meta.
I noticed some code fragment in RewriteRaftConfig function
{code:java}
// code placeholder
FsManager fs_manager(env, FsManagerOpts());
RETURN_NOT_OK(fs_manager.Open());{code}
This means I have to open the fs_data_dirs and fs_wal_dir 100 times if I want
to rewrite raft of 100 tablets.
As I think, if I got some tablet with same raft peers, I can just open the data
folder only once to rewrite their raft info.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)