Re: AUFS when disk full (ENOSPC)

Steffen Dettmer Wed, 28 Aug 2013 05:16:36 -0700

Hi!

thanks again for your prompt replies! Very nice to know that
there is good support helping, that's great.


On Tue, Aug 27, 2013 at 2:57 PM,  <sf...@users.sourceforge.net> wrote:
> Steffen Dettmer:
>> I assume the "most-likely" cause for us could be that some log
>> message appearing systematically and frequently (without having
>> automatic supression) resulting in multiple log files typically 5
>> MB each, but of course there are measures against and the file
>> system should never run full.
>
> Unfortunately I am not sure what most-likely is.
> Aufs is used by LiveCD/DVD/USB or something, and by servers.
> And I don't know they are similar use-cases.

Yes, we use it on flash (technically not USB anymore but SATA,
but this should make no difference).

> Anyway as long as you have rapidly growing multiple log files, I am
> afraid that aufs as topmost writable tmpfs is not a good idea. As you
> wrote, such log files will eat all disk space up in the end.

Ohh, I'm afraid I didn't describe clearly enough. The facts here are:
- disk usage is monitored, log file lengths are limited and
  "rotated", too many/big "rotated" log files are deleted
- file systems should not get full
- caused by bugs (logging flood, failure to delete a temp file,
  accidentally wget URLs in a loop - all by third party scripts,
  not by our software BTW) it happend that disks got full
- for full disks, there are counter-measures: if anything else
  fails, finally folders can be recursively deleted (killing all
  third party scripts as last resort)
- but the last-resort-emergency deletion may fail because "rm"
  fails with I/O error on aufs
- we are looking for a way to ensure that deleting from full file
  systems work
- We understand that, depending on our deletion rules, there can
  always be situations where disk space exhaust and we would end
  up with a full file system (BTW, in this case, we perform a
  reboot), but we are trying to limit that

> For such case, bind-mount /var/log may be better approach.
> For example,
>         mount -t tmpfs none /rw
>         mount -t ext4 /dev/disk/by-uuid/... /ro
>         mount -t aufs -o br=/rw:/ro none /
>         mount -o bind /ro/var/log /var/log

Indeed, a nice idea!
Alternatively, we could use a second small tmpfs for /var/log
(and in one case it had helped), but this is does not help when a
log file (or whatever file) is somewhere else.

>> Yes, someone cannot safely remountrw/remountro/remountrw an ext4
>         :::
>> bit problematic when it comes to be the root file system :)
>
> Whao, it is a surprise for me...

:) knew it :)

>> mmm... ok... What alternatives could be considered?
>
> At first, we need to make sure which file is so large. XINO or log
> files?

Any ordinary file no file system, probably not XINO files.

Some artificial examples: Log files (normally not, they are
checked each 60 seconds, so must be really quickly flooded to
fill disks) or whatever "temporary" files that are not deleted
(because of bugs in scripts) or having some  "tar czvf
bak/bak.tgz bak/" (includes itself on each run) or...

> If it is the log files, then I'd suggest you to put them on your flash
> by bind-mount. (Additionally you are "sync"ing files.)
> If it is the aufs XINO files,
> - at first, try truncating them by aufs re-mount options.
> - if it doesn't work well, check these.
>   + the size and the number of consumed blocks of XINO.
>   + the used inode numbers in /rw.
>     if those inode numbers are distributed very widely and equally, then
>     truncating XINO files may not work well since this "truncating" is
>     very similar to "cp --sparse=always". It means if the XINO file
>     doesn't contain any "file-hole", then nothing will be truncated.

As far as I understand, in our cases it was simply that there
were too many/too big ordinary files in the file system.

> By the way, I am going to try refining the XINO truncation logic in a
> few weeks.

Just in case this idea is not ridiculous:
some file systems, like ext4, reserve some memory to be available
to root only. Could aufs reserve some memory to be available to
XINO files only? This could create a safety margin. Of course the
usable disk size would be reduced by this margin, but because
XINO files still can be updated correctly, "rm" would work on a
full file system - then cleanup cron jobs could free disk space.

>> Do the debug files tell something helpful:
>>
>>   xi0:    1, 48x4096 61440
>>   xi1:    1, 216x4096 122340
>>   xib:    8x4096 4096
>>   xigen:  16x4096 8192
>>
>> what is causing the issue?
>
> The total number of blocks and bytes are
>         48 + 216 + 8 + 16 = 288 blocks
>         288 x 4096 = about 1.2 MB
>
> Absolutely I don't think your XINO file is large. But it depends upon
> the size of your /rw. How large is your /rw?

Filesystem                       Size  Used Avail Use% Mounted on
aufs                             505M  456K  505M   1% /
/dev/sda1                        935M  532M  356M  60% /ro
aufs-tmpfs                       505M  456K  505M   1% /rw

You guessed right, /dev/sda1 is the flash media :)
There are some other tmpfs (e.g. /tmp/).

Normally, the >300 MB free are much more than ever used, but from
time to time bugs happen.

Given this environment, would it be recommended to put XINO files
to an own tmpfs? Are there other recommendations?

Steffen

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk

Re: AUFS when disk full (ENOSPC)

Reply via email to