On Tuesday 21 Mar 2017 22:50:04 Kai Krakow wrote:
> Am Tue, 21 Mar 2017 23:22:48 +0200
> 
> schrieb Alan McKinnon <alan.mckin...@gmail.com>:
> > On 21/03/2017 22:16, Kai Krakow wrote:
> > > Test one by one... Either disable all, then enable one by one, or
> > > vice-versa.
> > > 
> > > Chances are that your FS may be blocking on sync. Do you maybe have
> > > a very high value in /proc/sys/vm/dirty_background_{ratio,bytes}?
> > > 
> > > If ratio is 0, then bytes is used. Ratio is a percent of your
> > > physical RAM. With the default kernel value in modern systems, this
> > > is ridiculously high for desktop systems. Maybe put a fixed value,
> > > like 128MB. The dirty background value is the amount of outstanding
> > > writes before a foreground process blocks on further writes. If
> > > this value is high, a sync may cause processes to freeze for a long
> > > time. Setting this to a lower value forces single processes to
> > > block early and give the kernel a chance to write back dirty data.
> > > 
> > > The next value to check is dirty_{ratio,bytes}. That is the combined
> > > maximum of outstanding data before the cache must be flushed. If
> > > this is hit, all writing processes freeze. So, having the
> > > background value high gives a greater chance of hitting this early.
> > > 
> > > The default values are 10% and 20% (ratio). I've made the 20% ratio
> > > into 10% and put 128MB for background which works quite well:
> > > Foreground processes are blocked for shorter times (because writing
> > > 128MB can be a few seconds or less, where 1.6GB can be minutes in a
> > > worse case, so if overall limit is hit, I'm screwed). The overall
> > > dirty buffer is still big enough to let the system buffer writes of
> > > multiple processes. My system has 16GB RAM, you may want to adjust
> > > it or try different values.
> > > 
> > > $ cat /etc/sysctl.d/98-caching.conf
> > > vm.dirty_background_bytes = 134217728
> > > vm.dirty_ratio = 10
> > > 
> > > Maybe point your Firefox cache to a tmpfs. If you're using tmpfs,
> > > don't put swappiness to low, otherwise data sitting in tmpfs cannot
> > > be swapped out and will cause filesystem caches to be discarded to
> > > early. I'm working with a 32GB tmpfs and standard swappiness for
> > > emerge, and I see no problems although multiple gigabytes of emerge
> > > build data may be swapped out. Still, emerge is so much faster now.
> > > But then, my swaps are on different disks (and I have multiple for
> > > getting some RAID-like striping of swap space).
> > > 
> > > Also, depending on which FS you're using, trying deadline instead of
> > > CFQ may greatly improve your desktop experience (browsers should
> > > benefit most from this).
> > 
> > You may be onto something here:
> > 
> > This is an 8-core i7 latop, 16G RAM
> > 
> > $ sudo cat /proc/sys/vm/dirty_background_bytes
> > 0
> > $ sudo cat /proc/sys/vm/dirty_background_ratio
> > 5
> > $ sudo cat /proc/sys/vm/dirty_bytes
> > 0
> > $ sudo cat /proc/sys/vm/dirty_ratio
> > 10
> 
> With a 16 GB machine, I recommend to not work with the ratio values and
> stick to bytes values. 1% steps is just so coarse. Put some reasonable
> values there.
> 
> > browser cache is on a regular laptop spinning-rust 500G disk
> 
> Try moving the cache to tmpfs just for the sake of eliminating that...
> Nowadays, /tmp is usually mounted with tmpfs (at least it should),
> otherwise mount tmpfs somewhere below /mnt and make it chmod 1777. Then
> create a cache directory there, rename your browser cache (while the
> browser is not running) and instead put a symlink to the newly created
> directory. Now do some tests without rebooting. If it works, create an
> fstab entry to mount a tmpfs directly to your firefox cache directory
> with correct permissions. Of course, it would be lost on reboots.
> 
> An alternative could be to put the cache on an FS with better write
> performance, like NILFS2 (it does linear writes only but reading will
> suffer, but reading is not that sensitive to blocking). Reiserfs can
> also perform well when fsyncs are involved. But it doesn't scale well
> to parallel accesses (which is not so relevant for desktop usage, and
> especially as browser cache). Also, XFS always performed very well for
> me (better than Ext4), for desktop and server usage. But that only
> makes sense if you convert your whole system to that. And it cannot
> play its benefits if used on single-disk systems.
> 
> > IO scheduler is BFQ, I use it for ages now.
> 
> Yes, good choice. I'd use it, too. But it causes troubles with btrfs
> (results in system freezes with fs corruption when I run VirtualBox).
> 
> > I did tests some years back and found it overall the best for an
> > interactive desktop with a DE. I haven't repeated those tests since,
> > has there been significant changes in this are last year or three?
> 
> It still performance very well. The next best option for me was using
> deadline. CFQ is an interactivity killer.
> 
> I'm combining this with bcache. That's a cache between kernel and
> filesystem that you put on SSD. Apparently, it requires repartitioning
> to map your filesystem through bcache (it has to add a protective
> superblock in front of your FS). So, a small SSD + bcache can make your
> complete 500GB spinning rust act mostly like SSD perfomance-wise.
> 
> I think there's a script that can move your FS 8 kB forward on HDD to
> add that bcache superblock. But I wouldn't try that without backup and
> some spare time. But it is a performance wonder.
> 
> Using 3x 1TB btrfs RAID + 500GB bcache here. The system feels like an
> SSD system but I don't have to decide what to put on a small SSD and
> what to put on big slow storage. Is just automagic. ;-)
> 
> BTW: Laptop disks are really slow usually because most manufacturers
> only build them with 5400 RPM disks. Maybe get a hybrid disk instead if
> you only have one slot. I think, Seagate still makes those. It should
> have similar benefits like bcache.


A desktop started having problems similar to Alan's since the last upgrade:

     Installed versions:  45.8.0^d(18:42:51 03/14/17)(dbus ffmpeg gmp-
autoupdate gstreamer jemalloc3 jit pulseaudio startup-notification system-
harfbuzz system-icu system-jpeg system-libevent system-libvpx system-sqlite -
bindist -custom-cflags -custom-optimization -debug -hardened -hwaccel -neon -
pgo -selinux -system-cairo -test -wifi L10N="en-GB -ach -af -an -ar -as -ast -
az -be -bg -bn-BD -bn-IN -br -bs -ca -cs -cy -da -de -el -en-ZA -eo -es-AR -
es-CL -es-ES -es-MX -et -eu -fa -fi -fr -fy -ga -gd -gl -gu -he -hi -hr -hsb -
hu -hy -id -is -it -ja -kk -km -kn -ko -lt -lv -mai -mk -ml -mr -ms -nb -nl -
nn -or -pa -pl -pt-BR -pt-PT -rm -ro -ru -si -sk -sl -son -sq -sr -sv -ta -te 
-th -tr -uk -uz -vi -xh -zh-CN -zh-TW")
     Homepage:            http://www.mozilla.com/firefox
     Description:         Firefox Web Browser

I thought it may be related to profile-sync-daemon (psd) mapping the browser 
cache to /tmp, but have not found the cause of this problem.  I noticed it has 
been happening when the user is creating new bookmarks, but I am not 100% 
sure.  Unlike Alan's case, here the whole PC may lock up, or in any case the 
keyboard is lost and I have to ssh in.  Typically one core is pegged at 100%.  
Killing firefox recovers the OS.  Sometimes the crash is too far gone by the 
time I am called and a 3 finger salute is necessary.

I'll ask the user to start it from a terminal next time in case something more 
meaningful shows up.
-- 
Regards,
Mick

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to