[
https://issues.apache.org/jira/browse/KUDU-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963856#comment-16963856
]
YangSong commented on KUDU-2975:
--------------------------------
Thank you, let me summarize the implementation:
1. We need to add a new gflag, such as "–fs_wal_dirs", to support spreading WAL
across multiple dirs. And we should keep around {{--fs_wal_dir}} for backwards
compatibility. User can chose one of them.
2. The first time 'fs_manager' is initialized it needs to generate an instance
file per wal directory. If the data directories (fs_data_dirs) not provided, we
use write-ahead log directories(fs_wal_dirs) as data directories. If the
metadata directory not provided, we use the first wal directories or the first
data directories. If one of the WAL directories doesn't exist, report a fatal
error. If some of WAL directories have 'instance' file, but some of them have
not, report a fatal error.
3. Add a class WalDirManager, maybe like this:
{quote}class WalDirManager {
public:
static Status Create(CanonicalizedRootsList wal_fs_roots,
std::unique_ptr<WalDirManager>* wal_manager); static Status
Open(CanonicalizedRootsList wal_fs_roots, std::unique_ptr<WalDirManager>*
wal_manager); ~WalDirManager();
void Shutdown();
Status LoadWalDirFromPB(const std::string& tablet_id, const WalDirPB& pb);
std::set<std::string> FindTabletsByWALDir(const std::string& wal_dir) const;
Status FindWalDirByTabletId(const std::string& tablet_id, std::string*
wal_dir) const;
Status MarkWalDirsFailed(const std::string& error_message = "");
void MarkWalDirFailed(const std::string& dir);
bool IsWalDirFailed(const std::string& dir) const;
const std::set<string> GetFailedDataDirs() const;
std::vector<std::string> GetWalDirs() const;
string GetWalDirByUuid(string uuid) const;
Status CreateWalDir(const std::string& tablet_id);
private:
WalDirManager(CanonicalizedRootsList canonicalized_wal_roots);
const CanonicalizedRootsList canonicalized_wal_fs_roots_;
typedef std::unordered_map<std::string, std::string> DirByUuidMap;
DirByUuidMap dir_by_uuid_;
typedef std::multimap<std::string, std::string> TabletsByDirMap;
TabletsByDirMap tablets_by_dir_;
typedef std::set<string> FailedWalDirSet;
FailedWalDirSet failed_data_dirs_;
}
{quote}
* We need to update the "instance" file under per WAL dir when creating a new
WalDirManager class. Each wal directory generates its own uuid, and recorde it
in the instance file.
* The directory structure may be like this:
{panel:title=one of WAL directorys's structure}
----wal
--------instance
--------wals
------------tablet1_uuid
----------------index.0
----------------wal.0
------------tablet2_uuid
----------------index.0
----------------wal.0
{panel}
* When creating metadata for tablet, you need to determine the wal directory
for the tablet. Record the identified uuid of dir into the tablet's metadata,
by WalDirPB.
* The way to determine the WAL directory for the tablet is to call the
function "WalDirManager::CreateWalDir()". A simple way to do this is to record
how many tablets there are in each WAL directory, and select the directory with
the lowest number of tablets each time.
* When deleting tablet, we need to delete the relevant information in
"TabletsByDirMap". For tombstoned tablet, we also need to clear the WAL dir
from the metadata.
4. After we've passed the initial FsManager checks and start bootstrapping, if
tablet's metadata is missing WAL directory information and the state of tablet
is not tombstoned, we mark the tablet failed. If metadata is OK, but has rowset
and miss WAL(such as "tablet1_uuid" missed, if "wal" missed, KUDU will crash
while checking FsManager), we also mark the tablet failed. I did a test with
the latest KUDU version, if I removed some tablets's WALs, then restarted the
tserver, the tserver could start with error like "Tablet failed to bootstrap:
Illegal state:Found rowsets but no log segments could be found.". If the
tserver was restarted immediately, tablet would be recovered by raft. If we
waited a few minutes, then restarted the tserver, the tablet has been recovered
to other tserver, the tablet would be tombstoned.
5. If a disk IO error is reported while reading or writing to WAL
file/directory, this is similar to what we do for data directory failures. We
may need to modify this function "FailTabletsInDataDir(string uudi)", change it
as "FailTabletsInDir(DirType type, string uuid)" , the "DirType" identifies
whether it belongs to the data directory or the WAL directory.
6. We also need to modify the relevant code about "--fs_wal_dir" in the tool.
Is this an accurate summary? There may be omissions or errors. This approach
seems relatively simpler and can solve the problem quickly.
> Spread WAL across multiple data directories
> -------------------------------------------
>
> Key: KUDU-2975
> URL: https://issues.apache.org/jira/browse/KUDU-2975
> Project: Kudu
> Issue Type: New Feature
> Components: fs, tablet, tserver
> Reporter: LiFu He
> Priority: Major
> Attachments: network.png, tserver-WARNING.png, util.png
>
>
> Recently, we deployed a new kudu cluster and every node has 12 SSD. Then, we
> created a big table and loaded data to it through flink. We noticed that the
> util of one SSD which is used to store WAL is 100% but others are free. So,
> we suggest to spread WAL across multiple data directories.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)