On Fri, Mar 8, 2013 at 6:55 PM, Frederick Grose <[email protected]> wrote:
> On Thu, Mar 7, 2013 at 10:41 AM, <[email protected]> wrote: > >> > From: Frederick Grose <[email protected]> >> > On Wed, Mar 6, 2013 at 3:59 PM, <[email protected]> wrote: >> <snip> >> > root@aos-61:46 # # Lets now make it all go wonky: >> > root@aos-61:46 # time dd if=/dev/zero of=/foo >> > Bus error >> > >> > real 1m15.775s >> > user 0m2.818s >> > sys 0m24.129s >> > root@aos-61:46 # >> > root@aos-61:46 # ls /root >> > -bash: /bin/ls: Input/output error >> > root@aos-61:46 # df -h >> > -bash: /usr/bin/df: Input/output error >> > >> > root@aos-61:46 # mount >> > >> > -bash: /usr/bin/mount: Input/output error >> > >> > root@aos-61:46 # cat /proc/meminfo >> > >> > -bash: /usr/bin/cat: Input/output error >> > >> > >> > Is this expected? Is there anything I can do, e.g., configuration- >> > wise, that can prevent this? Ideally this would fail much like any >> > other full disk situation. I understand that the overlay consumes >> > space, i.e., memory, for this file growth, including file removals, >> > but I'd at least like to be able to remotely reboot a system when in >> > this state, however I can't even do that because the reboot command >> > will either return the same I/O error or it may succeed but get the >> > I/O error when systemd tries to read >> /usr/lib/systemd/system/reboot.target. >> > >> > I dug around in bugzilla, but found nothing there. I can file a >> > bug, but which package is likely at fault here? >> > -- >> > John Florian >> > >> > See https://fedoraproject.org/wiki/LiveOS_image for some background >> > and potential workarounds. >> > >> > --Fred -- >> >> >> There's really not much on that page that helps me here. I'm trying to >> use Live images for a mostly-stateless embedded appliance OS deployed to >> hundreds or thousands of devices. I realize that the COW design is always >> going to be limited, but a more graceful failure mode is really needed, >> somehow. For our use, the biggest gain in stability here actually comes >> from systemd's journal with its trim-before-write approach instead of the >> legacy write now, trim asynchronously approach we used to have. However, >> that only covers one specific use case: logging. Writing to proper >> persistent storage allows me to avoid the root file system overlay, but >> most of these embedded devices use CF or SD cards for storage, which have >> limited write cycles that must be respected. >> >> Is there a way to implement an artificial capacity limit that would >> prevent processes from exhausting the overlay so that the reserve might be >> used for recording the event and rebooting back to a safer state? >> >> At the very least, I think this page could benefit from a little >> stronger, more explicit wording of this failure case. While it talks a >> little about some work-arounds, it actually says very little about why they >> are needed. Only in the "Overlay Recovery" section does it hint at the >> crash potential. >> >> -- >> John Florian >> > > Thank you for the review! I've updated the wiki page based on your > comments, > https://fedoraproject.org/wiki/LiveOS_image > > Documenting that a temporary overlay is a 0.5 GiB sparse file in a RAM > filesystem gave me the idea to try using an overlay size greater than > available memory, and hope that kernel out-of-memory warnings would > intervene before the device-mapper filesystem invalidation. > > I modified /usr/sbin/dmsquash-live-root in the initramfs to create a > temporary 500 GiB sparse overlay: > > dd if=/dev/null of=/overlay bs=1024 count=1 seek=$((512*1024*1024)) 2> > /dev/null > > Then after booting an updated, Fedora 18 Live desktop, LiveUSB read only > and running your failure demo, > > time dd if=/dev/zero of=/foo > > I got out-of-memory warnings after a file of about 450 MiB was written and > the command returned--no crash! > > Some post test output: > > [root@localhost ~]# dmsetup status > live-osimg-min: 0 8388608 snapshot 2584/2584 24 > live-rw: 0 8388608 snapshot 921720/1073741824 3600 > > top - 18:11:53 up 17 min, 3 users, load average: 0.68, 0.75, 0.57 > Tasks: 182 total, 2 running, 180 sleeping, 0 stopped, 0 zombie > %Cpu(s): 1.6 us, 1.6 sy, 0.0 ni, 96.5 id, 0.0 wa, 0.2 hi, 0.0 si, > 0.0 st > KiB Mem: 3339812 total, 3260284 used, 79528 free, 316384 buffers > KiB Swap: 3341308 total, 0 used, 3341308 free, 1948108 cached > > You might test this method in your systems and let us know how it works. > > --Fred > Pardon my bad observations, my above conclusion IS WRONG and unsupported by the above test. I deceived myself with an unfamiliar error message, and actually seem to have tested James Heather's method in my last test. My root filesystem size was 4 GiB with about 450 MiB free. An out-of-disc-space warning is what actually popped up and caused the test command to exit before another failure or crash. To retest the oversized overlay hypothesis, I resized the LiveUSB root filesystem to 12 GiB and repeated the test on it as an attached LiveOS filesystem, /dev/mapper/dm-PCBV6p (mounted at /mnt/a). [root@localhost a]# dmsetup status dm-PCBV6p: 0 25165824 snapshot 24/1073741824 16 [root@localhost ~]# dmsetup table dm-PCBV6p: 0 25165824 snapshot 7:9 7:10 P 8 [root@localhost ~]# losetup -a /dev/loop8: [2081]:1833 (/run/media/fgrose/LIVE/LiveOS/squashfs.img) /dev/loop9: [1800]:3 (/run/media/livemnt-squash-mRJNIA/LiveOS/rootfs.img) /dev/loop10: [0017]:58601 (/run/media/tmpvJjuX7) /dev/loop11: [2081]:1832 (/run/media/fgrose/LIVE/LiveOS/home.img) [root@localhost ~]# df -Th Filesystem Type Size Used Avail Use% Mounted on devtmpfs devtmpfs 1.6G 0 1.6G 0% /dev tmpfs tmpfs 1.6G 152K 1.6G 1% /dev/shm tmpfs tmpfs 1.6G 3.3M 1.6G 1% /run tmpfs tmpfs 1.6G 0 1.6G 0% /sys/fs/cgroup /dev/sda1 ext4 18G 8.8G 7.7G 54% / tmpfs tmpfs 1.6G 28K 1.6G 1% /tmp /dev/sdc1 vfat 15G 8.8G 6.2G 59% /run/media/fgrose/LIVE /dev/loop8 squashfs 929M 929M 0 100% /run/media/livemnt-squash-mRJNIA /dev/mapper/dm-PCBV6p ext4 12G 3.4G 8.2G 30% /mnt/a /dev/loop11 ext4 380M 35M 325M 10% /mnt/a/home [root@localhost ~]# mount proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel) devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=1652304k,nr_inodes=413076,mode=755) securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime) selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000) tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755) tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,seclabel,mode=755) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) /dev/sda1 on / type ext4 (rw,relatime,seclabel,data=ordered) rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime) systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=37,pgrp=1,timeout=300,minproto=5,maxproto=5,direct) configfs on /sys/kernel/config type configfs (rw,relatime) mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel) debugfs on /sys/kernel/debug type debugfs (rw,relatime) tmpfs on /tmp type tmpfs (rw,seclabel) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel) gvfsd-fuse on /run/user/1000/gvfs type fuse.gvfsd-fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000) /dev/sdc1 on /run/media/fgrose/LIVE type vfat (rw,nosuid,nodev,relatime,uid=1000,gid=1000,fmask=0022,dmask=0077,codepage=437,iocharset=ascii,shortname=mixed,showexec,utf8,flush,errors=remount-ro,uhelper=udisks2) /dev/sdb2 on /var/cache/yum type ext4 (rw,relatime,seclabel,data=ordered) /dev/loop8 on /run/media/livemnt-squash-mRJNIA type squashfs (ro,relatime,seclabel) /dev/mapper/dm-PCBV6p on /mnt/a type ext4 (rw,relatime,seclabel,data=ordered) /dev/loop11 on /mnt/a/home type ext4 (rw,relatime,seclabel,data=ordered) /dev/sdc1 on /mnt/a/run/initramfs/live type vfat (rw,nosuid,nodev,relatime,uid=1000,gid=1000,fmask=0022,dmask=0077,codepage=437,iocharset=ascii,shortname=mixed,showexec,utf8,flush,errors=remount-ro) The target filesystem, /dev/mapper/dm-PCBV6p, did go invalid, was changed to ro, which led to this test command output: [root@localhost a]# time dd if=/dev/zero of=foo dd: writing to ‘foo’: Read-only file system 4029694+0 records in 4029693+0 records out 2063202816 bytes (2.1 GB) copied, 40.0696 s, 51.5 MB/s real 0m40.079s user 0m3.799s sys 0m32.422s In a separate terminal I manually monitored the dmsetup status: [root@localhost ~]# dmsetup status dm-PCBV6p: 0 25165824 snapshot 184/1073741824 16 [root@localhost ~]# dmsetup status dm-PCBV6p: 0 25165824 snapshot 192/1073741824 16 [root@localhost ~]# dmsetup status dm-PCBV6p: 0 25165824 snapshot 192/1073741824 16 [root@localhost ~]# dmsetup status dm-PCBV6p: 0 25165824 snapshot 263440/1073741824 1040 [root@localhost ~]# dmsetup status dm-PCBV6p: 0 25165824 snapshot 366360/1073741824 1440 [root@localhost ~]# dmsetup status dm-PCBV6p: 0 25165824 snapshot 526608/1073741824 2064 [root@localhost ~]# dmsetup status dm-PCBV6p: 0 25165824 snapshot 703280/1073741824 2752 [root@localhost ~]# dmsetup status dm-PCBV6p: 0 25165824 snapshot 904600/1073741824 3528 [root@localhost ~]# dmsetup status dm-PCBV6p: 0 25165824 snapshot 1131568/1073741824 4416 [root@localhost ~]# dmsetup status dm-PCBV6p: 0 25165824 snapshot 1383288/1073741824 5392 [root@localhost ~]# dmsetup status dm-PCBV6p: 0 25165824 snapshot 1579432/1073741824 6160 [root@localhost ~]# dmsetup status dm-PCBV6p: 0 25165824 snapshot 1579432/1073741824 6160 [root@localhost ~]# dmsetup status dm-PCBV6p: 0 25165824 snapshot 1731568/1073741824 6752 [root@localhost ~]# dmsetup status dm-PCBV6p: 0 25165824 snapshot 2191040/1073741824 8536 [root@localhost ~]# dmsetup status dm-PCBV6p: 0 25165824 snapshot 2420840/1073741824 9432 [root@localhost ~]# dmsetup status dm-PCBV6p: 0 25165824 snapshot 2632232/1073741824 10256 [root@localhost ~]# dmsetup status dm-PCBV6p: 0 25165824 snapshot 2632288/1073741824 10256 [root@localhost ~]# dmsetup status dm-PCBV6p: 0 25165824 snapshot 2964616/1073741824 11544 [root@localhost ~]# dmsetup status dm-PCBV6p: 0 25165824 snapshot 3208632/1073741824 12496 [root@localhost ~]# dmsetup status dm-PCBV6p: 0 25165824 snapshot Invalid [root@localhost ~]# df -Th Filesystem Type Size Used Avail Use% Mounted on devtmpfs devtmpfs 1.6G 0 1.6G 0% /dev tmpfs tmpfs 1.6G 504K 1.6G 1% /dev/shm tmpfs tmpfs 1.6G 1000M 632M 62% /run tmpfs tmpfs 1.6G 0 1.6G 0% /sys/fs/cgroup /dev/sda1 ext4 18G 8.7G 7.7G 54% / tmpfs tmpfs 1.6G 68K 1.6G 1% /tmp /dev/sdc1 vfat 15G 8.8G 6.2G 59% /run/media/fgrose/LIVE /dev/loop8 squashfs 929M 929M 0 100% /run/media/livemnt-squash-mRJNIA /dev/mapper/dm-PCBV6p ext4 12G 4.6G 7.1G 40% /mnt/a /dev/loop11 ext4 380M 35M 325M 10% /mnt/a/home The invalidation occurred at the 1.6 GB size limit applied to the /run tmpfs where the overlay, /dev/loop10, was mounted, [root@localhost ~]# losetup /dev/loop10 /dev/loop10: [0017]:58601 (/run/media/tmpvJjuX7) [root@localhost ~]# ls /mnt/a ls: cannot access /mnt/a/.readahead: Input/output error top - 00:26:06 up 13 min, 4 users, load average: 0.55, 0.68, 0.44 Tasks: 204 total, 2 running, 202 sleeping, 0 stopped, 0 zombie %Cpu(s): 4.3 us, 1.6 sy, 0.0 ni, 92.4 id, 1.3 wa, 0.3 hi, 0.0 si, 0.0 st KiB Mem: 3339812 total, 3256176 used, 83636 free, 68956 buffers KiB Swap: 3341308 total, 0 used, 3341308 free, 2312664 cached Notice that Swap was not activated, but free memory got down to ~83 MiB. When I tested the above on the booted LiveUSB, 2-3 GiB of swap was activated before the fatal crash. So an oversized overlay DOES NOT prevent device-mapper invalidation by the above test method. --Fred
-- livecd mailing list [email protected] https://admin.fedoraproject.org/mailman/listinfo/livecd
