Just reomve /etc/cron.weekly/fstrim as a workaround.  The underlying bug
is not in fstrim, but in the kernel somewhere; probably the XFS
filesystem driver and its interaction between preallocation and the
FIBMAP ioctl.


** Package changed: util-linux (Ubuntu) => linux (Ubuntu)

** Tags added: bot-stop-nagging

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1686687

Title:
  fstrim destroying XFS on SAN

Status in linux package in Ubuntu:
  New

Bug description:
  We observed severe data loss/filesystem corruption when executing
  fstrim on a filesystem hosted on an Eternus DX600 S3 system.

  There is multipathing via a fibre channel fabrics but the issue could
  be reproduced when disabling multipathing and using one of the block
  devices directly.

  It could not be reproduced when creating a multipathing device via
  dmsetup with four paths pointing to four loop devices mapping the same
  file.

  The observed behavior is that XFS cannot read vital filesystem
  metadata as the underlying storage device returns blocks of 0x00. The
  blocks are discarded via UNMAP commands and since thin provisioning is
  used, the SAN deallocates them and returns 0x00 on subsequent reads.
  Invoking find yields error messages like "find: ./dir_16: Structure
  needs cleaning". In other tests, where more data had been written,
  files were accessible but checksums did no longer match.

  In consequence, the XFS filesystem is in an unusable state and has to
  be created freshly, equaling complete data loss. Trying to repair the
  filesystem had proven not to be worth it as backups were available and
  trust had already been compromised.

  The problem was discovered after installing a new storage server with
  ubuntu 16.04, intending to replace the current machine running 14.04.
  Every weekend, the test volumes were corrupted. Investigation pointed
  towards Sunday, 06:47, which is the time `cron.weekly` is run. The job
  file `/etc/cron.weekly/fstrim` seemed most likely, so `fstrim -a` was
  run manually after `mkfs.xfs` and the filesystem became damaged. The
  damage only became apparent after a `umount` `mount` cycle, when all
  buffers were flushed and data was re-read from the device.

  We now could use config management to install a cronjob that (every
  minute!) checks for /sbin/fstrim and renames it, if present. This
  would be extremely unsatisfactory as it is a brittle workaround. So
  for now, we are locked on ubuntu 14.04. Since util-linux is one of the
  most central packages, there is no way to not have fstrim or the
  cronjob on a ubuntu system.

  I have attached a script used to reproduce the bug reliably on our
  system and its log output, as well as excerpts from syslog and md5sum.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1686687/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to