[ 
https://issues.apache.org/jira/browse/KUDU-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964460#comment-16964460
 ] 

Adar Dembo commented on KUDU-2975:
----------------------------------

Andrew, thanks for the detailed summary. I agree with pretty much everything 
you wrote.

We talked about this a little bit offline, but I'm also curious to hear whether 
you think failing replicas and "forgetting" their WALs is safe from a Raft 
perspective. I think the answer is "yes", because even if a WAL directory 
disappears, the cmeta tombstone created for the failed replica remains behind 
in the metadata directory. So if the tablet is reincarnated on the same tserver 
in the same term, the tombstone will tell us who we voted for and we can't 
change our vote. That said, I also know that LEADER replicas persist a NOOP op 
to the WAL before replicating any additional ops, but I don't remember why we 
do that. Is it OK if that NOOP were to disappear between the two replicas' life 
times?
 
{quote}
What happens if a WAL disk crashes, we make a new copy for tablet A on
the same tablet server, and Kudu places a new replica copy on the same
server. While running, this might be fine, but if we restart the tserver and
the bad disk is readable, we might now have two WAL directories for A!
How should we handle this?
{quote}

Agreed that UUID mapping is the way to go here. Apart from being precise (i.e. 
a replica knows exactly on which disk and in which directory to find its WALs), 
it allows the bad-but-now-good disk to remain unused by Kudu, free for the 
admin to inspect and deal with on their own time.

{quote}
BTW another implementation that I've thought of is expanding the
DataDirManager's responsibility to include WALs. That would allow this single
"DirectoryManager" to make sure that tablet's WAL directory is a member of the
tablet replica's directory group, so the "failure" tracking happens in a single
entity. I don't like this approach as much now because it makes the
responsibilities of the DirectoryManager very, very large, and it tightly
couples the {{--fs_data_dirs}} flag with the WALs, which I don't like that
much.
{quote}

I expect that there'll be a lot of commonality between WAL directories and data 
directories, at least in terms of management. For example:
* Creating/validating/updating instance files and UUIDs.
* Mapping from UUIDs to replicas (even if the two differ in that N data dirs 
are allowed but only 1 WAL dir is allowed).
* Mapping from failed directories to affected replicas.


> Spread WAL across multiple data directories
> -------------------------------------------
>
>                 Key: KUDU-2975
>                 URL: https://issues.apache.org/jira/browse/KUDU-2975
>             Project: Kudu
>          Issue Type: New Feature
>          Components: fs, tablet, tserver
>            Reporter: LiFu He
>            Priority: Major
>         Attachments: network.png, tserver-WARNING.png, util.png
>
>
> Recently, we deployed a new kudu cluster and every node has 12 SSD. Then, we 
> created a big table and loaded data to it through flink.  We noticed that the 
> util of one SSD which is used to store WAL is 100% but others are free. So, 
> we suggest to spread WAL across multiple data directories.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to