[ 
https://issues.apache.org/jira/browse/KUDU-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429019#comment-16429019
 ] 

Andrew Wong edited comment on KUDU-2359 at 4/7/18 6:31 AM:
-----------------------------------------------------------

Clarifying this a bit based on some offline discussion with Adar, it's less 
that the file vanished; rather, our current code couldn't read anything from 
the instance file, and thus returned a "file not found" error. We took a look 
at `ls` via strace, we found that an EIO was triggered in getdents(). A snippet 
of the strace here:
{quote}{{ioctl(1</dev/pts/0>, SNDCTL_TMR_TIMEBASE or 
SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, \{B38400 opost isig icanon echo ...}) 
= 0}}
 {{ioctl(1</dev/pts/0>, TIOCGWINSZ, \{ws_row=88, ws_col=357, ws_xpixel=0, 
ws_ypixel=0}) = 0}}
 {{stat("/data/6", \{st_mode=S_IFDIR|S_ISVTX|0777, st_size=2048, ...}) = 0}}
 {{open("/data/6", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3}}
 {{fcntl(3</data/6>, F_GETFD) = 0x1 (flags FD_CLOEXEC)}}
 {{getdents(3</data/6>, 0x8ecc90, 32768) = -1 EIO (Input/output error)}}
 {{open("/usr/share/locale/locale.alias", O_RDONLY) = 4}}
 {{fstat(4</usr/share/locale/locale.alias>, \{st_mode=S_IFREG|0644, 
st_size=2512, ...}) = 0}}
{quote}
This might help explain why Kudu wasn't getting hit with an EIO – presumably, 
the functions we call are hiding some EIO farther down the call stack.

We discussed a few options, considering potentially having more stringent 
checking around a mount point for failures (e.g. snooping around the file 
system for more info on failures if we run into an error at the file level), 
but settled on the point that, at least for start up, treating missing instance 
files as failed instance files would have the desired behavior.

The case for update_dirs is trickier, for the reasons mentioned above. One 
implementation we considered was to perhaps treat _all_ instances that returned 
errors upon loading as "missing" when running `kudu fs update_dirs`. As long as 
we don't do anything silly like prematurely overwrite files before knowing that 
the entire operation has completed, we _should_ be able to get away with this, 
since the update will eventually fail at some point throughout the run of the 
tool. What we lose out on is, rather than short-circuiting if we see a disk 
failure, the update tool will attempt to do stuff (read, rewrite on other 
drives, etc.) because we're not sure whether we're "failed" or "missing" or 
whatever. We could have some heuristics like, "If we notice a failed instance, 
definitely do not try to update, but if we see a missing disk, try to update 
and if we can't because the disk has actually failed, revert everything" to 
make the semantics better, but for now I'll see how well this works.


was (Author: andrew.wong):
Clarifying this a bit based on some offline discussion with Adar, it's less 
that the file vanished; rather, our current code couldn't read anything from 
the instance file, and thus returned a "file not found" error. Looking at the 
file system with strace, we found that the EIO was triggered in getdents(). A 
snippet of the strace here:
{quote}{{ioctl(1</dev/pts/0>, SNDCTL_TMR_TIMEBASE or 
SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, \{B38400 opost isig icanon echo ...}) 
= 0}}
{{ioctl(1</dev/pts/0>, TIOCGWINSZ, \{ws_row=88, ws_col=357, ws_xpixel=0, 
ws_ypixel=0}) = 0}}
{{stat("/data/6", \{st_mode=S_IFDIR|S_ISVTX|0777, st_size=2048, ...}) = 0}}
{{open("/data/6", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3}}
{{fcntl(3</data/6>, F_GETFD) = 0x1 (flags FD_CLOEXEC)}}
{{getdents(3</data/6>, 0x8ecc90, 32768) = -1 EIO (Input/output error)}}
{{open("/usr/share/locale/locale.alias", O_RDONLY) = 4}}
{{fstat(4</usr/share/locale/locale.alias>, \{st_mode=S_IFREG|0644, 
st_size=2512, ...}) = 0}}{quote}
We discussed a few options, considering potentially having more stringent 
checking around a mount point for failures (snooping around the file system for 
more info on failures), but settled on the point that, at least for start up, 
treating missing instance files as failed instance files would have the desired 
behavior.

The case for update_dirs is trickier, for the reasons mentioned above. One 
implementation we considered was to perhaps treat _all_ instances that returned 
errors upon loading as missing when running `kudu fs update_dirs`. As long as 
we don't do anything silly like prematurely overwrite files before knowing that 
the entire operation has completed, we _should_ be able to get away with this, 
since presumably the update will eventually fail at some point throughout the 
run of the tool. What we lose out on is, rather than short-circuiting if we see 
a disk failure, the update tool will attempt to do stuff (read, rewrite on 
other drives, etc.) because we're not sure whether we're "failed" or "missing" 
or whatever. We could have some heuristics like, "If we notice a failed 
instance, definitely do not try to update, but if we see a missing disk, try to 
update and if we can't because the disk has actually failed, revert everything" 
to make the semantics better, but for now I'll see how well this works.

> tserver should allow starting with a small number of missing data dirs
> ----------------------------------------------------------------------
>
>                 Key: KUDU-2359
>                 URL: https://issues.apache.org/jira/browse/KUDU-2359
>             Project: Kudu
>          Issue Type: Improvement
>          Components: fs, tserver
>            Reporter: Todd Lipcon
>            Assignee: Andrew Wong
>            Priority: Major
>
> Often when a disk fails, its mount point will not come back up when the 
> server is restarted. Currently, Kudu will respond to this by failing to 
> restart with an error like:
> F0314 18:23:39.353916 112051 tablet_server_main.cc:80] Check failed: _s.ok() 
> Bad status: Already present: FS layout already exists; not overwriting 
> existing layout. See 
> https://kudu.apache.org/releases/1.8.0-SNAPSHOT/docs/troubleshooting.html: 
> unable to create file system roots: FSManager roots already exist: 
> /data/1/kudu,/data/2/kudu,/data/3/kudu,/data/5/kudu,/data/6/kudu,/data/7/kudu,/data/8/kudu,/data/1/kudu-wal
> However, this defeats some of the advantages of the "allow single disk 
> failure" work. One could use the update_data_dirs tool to remove the missing 
> disk, but you'd also need to persistently change the configuration of the 
> daemon, which is hard to do with a consistent configuration management.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to