[
https://issues.apache.org/jira/browse/KUDU-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963802#comment-16963802
]
YangSong commented on KUDU-2975:
--------------------------------
Thank you, let me summarize the implementation:
# We need to add a new gflag, such as "–fs_wal_dirs", to support spreading WAL
across multiple dirs. And we should keep around {{--fs_wal_dir}} for backwards
compatibility. User can chose one of them.
# The first time 'fs_manager' is initialized it needs to generate an instance
file per wal directory. If the data directories (fs_data_dirs) not provided, we
use write-ahead log directories(fs_wal_dirs) as data directories. If the
metadata directory not provided, we use the first wal directories or the first
data directories. If one of the WAL directories doesn't exist, report a fatal
error. If some of WAL directories have 'instance' file, but some of them have
not, report a fatal error.
# Add a class WalDirManager, maybe like this:class WalDirManager \{ public:
static Status Create(CanonicalizedRootsList wal_fs_roots,
std::unique_ptr<WalDirManager>* wal_manager); static Status
Open(CanonicalizedRootsList wal_fs_roots, std::unique_ptr<WalDirManager>*
wal_manager); ~WalDirManager(); void Shutdown(); Status
LoadWalDirFromPB(const std::string& tablet_id, const WalDirPB& pb);
std::set<std::string> FindTabletsByWALDir(const std::string& wal_dir) const;
Status FindWalDirByTabletId(const std::string& tablet_id, std::string* wal_dir)
const; Status MarkWalDirsFailed(const std::string& error_message = "");
void MarkWalDirFailed(const std::string& dir); bool IsWalDirFailed(const
std::string& dir) const; const std::set<string> GetFailedDataDirs() const;
std::vector<std::string> GetWalDirs() const; string GetWalDirByUuid(string
uuid) const; Status CreateWalDir(const std::string& tablet_id); private:
WalDirManager(CanonicalizedRootsList canonicalized_wal_roots); const
CanonicalizedRootsList canonicalized_wal_fs_roots_; typedef
std::unordered_map<std::string, std::string> DirByUuidMap; DirByUuidMap
dir_by_uuid_; typedef std::multimap<std::string, std::string>
TabletsByDirMap; TabletsByDirMap tablets_by_dir_; typedef std::set<string>
FailedWalDirSet; FailedWalDirSet failed_data_dirs_; };
We need to update the "instance" file under per WAL dir when creating a new
WalDirManager class. Each wal directory generates its own uuid, and recorde it
in the instance file.The directory structure may be like this:
--wal ----instance
# adf
# asdfadf
# dasf
> Spread WAL across multiple data directories
> -------------------------------------------
>
> Key: KUDU-2975
> URL: https://issues.apache.org/jira/browse/KUDU-2975
> Project: Kudu
> Issue Type: New Feature
> Components: fs, tablet, tserver
> Reporter: LiFu He
> Priority: Major
> Attachments: network.png, tserver-WARNING.png, util.png
>
>
> Recently, we deployed a new kudu cluster and every node has 12 SSD. Then, we
> created a big table and loaded data to it through flink. We noticed that the
> util of one SSD which is used to store WAL is 100% but others are free. So,
> we suggest to spread WAL across multiple data directories.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)