-------- Original Message -------- Subject: CR 6853286 Created, P2 opensolaris/triage-queue bogus file attribute change events slow down gnome applications, after 6539657
*Synopsis*: bogus file attribute change events slow down gnome applications, after 6539657 *Change Request ID*: 6853286 *Synopsis*: bogus file attribute change events slow down gnome applications, after 6539657 Product: solaris Category: opensolaris Subcategory: triage-queue Type: Defect Subtype: Status: 1-Dispatched Substatus: Priority: 2-High Introduced In Release: Introduced In Build: Keywords: opensolaris === *Description* ============================================================ Category solaris/kernel (Solaris Kernel) Sub-Category event_ports Description After I bfu'ed my system using archives compiled from current onnv-gate sources (~ since 2009-6-17), the idle gnome desktop has started to consume *lots* of cpu time. The system is a SX:CE snv_111, bfu'ed with current onnv-gate bits. With bfu updates before ~ 2009-6-17 the idle gnome desktop didn't have a significant amount of cpu usage. I see several gnome processes (gnome-settings-daemon, nautilus, gnome-panel) each trying to consume 100% cpu time; the processes perform stat and port_associate system calls for files all over the place (/etc/xdg/..., X11 font files, files in my home directory, ~/.local/.... ). Looking though the change log for the last 2-3 days I noticed the putback for bug 6539657 "touch(1) does not set the nanosecond timestamp of a file correctly". When I disable the stat microsecond workaround[*] that was added with 6539657, the gnome desktop returns to sane behaviour; the idle gnome desktop does not consume lots of cpu time any more. [*] echo stat_force_usec_granularity/W0 | mdb -wk ----------------- Ok, seems that stat() on zfs is returning file attribute timestamps with the tv_nsec field rounded to microsecond granularity after the putback for 6539657. Gnome is using these timestamps to watch for file attribute changes using port_associate() and port_getn() system calls. Inside the kernel the tv_nsec fields for a file from a zfs are compared to the rounded tv_nsec timestamps that had been returned from stat(). They are different (the sub-microsecond part should be different), so that port_getn() returns bogus file attribute change events. These events seem to trigger some heavy rescanning of gnome desktop files, start menu file entries, fonts, ... in several gnome programs. Setting stat_force_usec_granularity to 0 returns full nanosecond timestamps, and port_associate() / port_getn() don't report bogus file timestamp change events. ----------------- ... /1: 0.0006clock_gettime(4, 0x08046C58)= 0 /1: 0.0005clock_gettime(4, 0x08046C58)= 0 /2:read(17, 0x9AC8EF54, 20)(sleeping...) /1: 1.3090pollsys(0x082611B8, 9, 0x08046DE8, 0x00000000)= 1 /1:fd=4 ev=POLLIN rev=POLLIN /1:fd=6 ev=POLLIN rev=0 /1:fd=10 ev=POLLIN|POLLPRI rev=0 /1:fd=13 ev=POLLIN|POLLPRI rev=0 /1:fd=14 ev=POLLIN|POLLPRI rev=0 /1:fd=15 ev=POLLIN|POLLPRI rev=0 /1:fd=16 ev=POLLIN|POLLPRI rev=0 /1:fd=19 ev=POLLIN rev=0 /1:fd=5 ev=POLLIN rev=0 /1:timeout: 0.000000000 sec /1: 0.0007port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0002port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0001port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0002port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0002port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0001port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0002port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0001port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0002port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0001port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0001port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0002port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0001port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0002port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0001port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0001port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0002port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0001port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0001port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0002port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0002port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0001port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0001port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0002port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0001port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0001port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0002port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0001port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0002port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0001port_getn(20, 0x08046930, 64, 1, 0x08046928)= 64 [0] /1: 0.0001port_getn(20, 0x08046930, 64, 1, 0x08046928)= 45 [0] /1: 0.0028lstat64("/etc/fonts/fonts.conf", 0x08046C40)= 0 /1: d=0x02D90002 i=12386 m=0100444 l=1 u=0 g=2 sz=14364 /1:at = Jun 20 11:34:17 CEST 2009 [ 1245490457 ] /1:mt = Mar 13 06:43:42 CET 2009 [ 1236923022 ] /1:ct = Apr 27 18:53:27 CEST 2009 [ 1240851207 ] /1: bsz=14848 blks=7 fs=zfs /1: 0.0003port_associate(20, 7, 0x080BB810, 0x10000006, 0x080B95C8) = 0 /1: 0.0004lstat64("/etc/vfstab", 0x08046C40)= 0 /1: d=0x02D90002 i=362618 m=0100644 l=1 u=0 g=3 sz=410 /1:at = Jun 20 11:44:42 CEST 2009 [ 1245491082 ] /1:mt = May 9 20:26:04 CEST 2009 [ 1241893564 ] /1:ct = Jun 20 11:08:49 CEST 2009 [ 1245488929 ] /1: bsz=512 blks=2 fs=zfs /1: 0.0001port_associate(20, 7, 0x080BD1B0, 0x10000006, 0x080B9898) = 0 /1: 0.0003lstat64("/usr/openwin/lib/X11/fonts/TrueType/FreeSansOblique.ttf", 0x08046C40) = 0 /1: d=0x02D90002 i=315877 m=0100755 l=1 u=0 g=2 sz=110740 /1:at = Mar 11 06:31:49 CET 2009 [ 1236749509 ] /1:mt = Mar 11 06:31:49 CET 2009 [ 1236749509 ] /1:ct = Apr 27 20:22:04 CEST 2009 [ 1240856524 ] /1: bsz=111104 blks=110 fs=zfs /1: 0.0002port_associate(20, 7, 0x080C1C68, 0x10000006, 0x080BEF18) = 0 /1: 0.0002lstat64("/usr/openwin/lib/X11/fonts/TrueType/LucidaBrightDemiBold.ttf", 0x08046C40) = 0 /1: d=0x02D90002 i=53833 m=0100444 l=1 u=0 g=2 sz=75144 /1:at = Apr 27 18:58:49 CEST 2009 [ 1240851529 ] /1:mt = Mar 13 08:41:11 CET 2009 [ 1236930071 ] /1:ct = Apr 27 18:58:45 CEST 2009 [ 1240851525 ] /1: bsz=75264 blks=95 fs=zfs /1: 0.0002port_associate(20, 7, 0x080C1D90, 0x10000006, 0x080BEF30) = 0 /1: 0.0002lstat64("/usr/openwin/lib/X11/fonts/TrueType/gulim.ttf", 0x08046C40) = 0 /1: d=0x02D90002 i=315884 m=0100755 l=1 u=0 g=2 sz=10385096 /1:at = Mar 11 06:31:49 CET 2009 [ 1236749509 ] /1:mt = Mar 11 06:31:49 CET 2009 [ 1236749509 ] /1:ct = Apr 27 20:22:08 CEST 2009 [ 1240856528 ] /1: bsz=131072 blks=8755 fs=zfs /1: 0.0002port_associate(20, 7, 0x080C1E80, 0x10000006, 0x080BEF48) = 0 /1: 0.0003lstat64("/usr/openwin/lib/X11/fonts/TrueType/hline.ttf", 0x08046C40) = 0 /1: d=0x02D90002 i=315885 m=0100755 l=1 u=0 g=2 sz=1203464 /1:at = Apr 27 21:59:14 CEST 2009 [ 1240862354 ] /1:mt = Mar 11 06:31:49 CET 2009 [ 1236749509 ] /1:ct = Apr 27 20:22:08 CEST 2009 [ 1240856528 ] /1: bsz=131072 blks=808 fs=zfs /1: 0.0002port_associate(20, 7, 0x080C2918, 0x10000006, 0x080BEE88) = 0 ... ----------------- #!/usr/sbin/dtrace -s BEGIN { PORT_FOP_ACTIVE = 1; FILE_MODIFIED = 2; } fbt::port_check_timestamp:entry /(((portfop_t *)arg3)->pfop_flags & PORT_FOP_ACTIVE) && (((portfop_t *)arg3)->pfop_events & FILE_MODIFIED)/ { this->vp = (vnode_t*)arg1; this->pfp = arg3; this->objptr = arg4; this->fobj = (file_obj_t*)this->objptr; printf("at:%u.%09u mt:%u.%09u ct:%u.%09u", this->fobj->fo_atime.tv_sec, this->fobj->fo_atime.tv_nsec, this->fobj->fo_mtime.tv_sec, this->fobj->fo_mtime.tv_nsec, this->fobj->fo_ctime.tv_sec, this->fobj->fo_ctime.tv_nsec); self->d = stackdepth; } fbt::vhead_getattr:entry /stackdepth == self->d + 2/ { self->vap = (vattr_t*)arg1; } fbt::vhead_getattr:return /self->vap != 0/ { printf("at:%u.%09u mt:%u.%09u ct:%u.%09u", self->vap->va_atime.tv_sec, self->vap->va_atime.tv_nsec, self->vap->va_mtime.tv_sec, self->vap->va_mtime.tv_nsec, self->vap->va_ctime.tv_sec, self->vap->va_ctime.tv_nsec); self->vap = 0; self->d = 0; } ... 0 50545 port_check_timestamp:entry at:1236929113.000000000 mt:1236929113.000000000 ct:1240851088.196400000 0 10004 vhead_getattr:return at:1236929113.000000000 mt:1236929113.000000000 ct:1240851088.196400740 0 50545 port_check_timestamp:entry at:1240862492.034252000 mt:1236929113.000000000 ct:1240851088.197096000 0 10004 vhead_getattr:return at:1240862492.034252384 mt:1236929113.000000000 ct:1240851088.197096808 0 50545 port_check_timestamp:entry at:1240862491.900676000 mt:1236929112.000000000 ct:1240851088.195174000 0 10004 vhead_getattr:return at:1240862491.900676696 mt:1236929112.000000000 ct:1240851088.195174171 ... Frequency Always Regression Solaris 10 Steps to Reproduce Start gnome desktop on a system that was bfu'ed to bits containing the putback for 6539657. Check "prstat" or "top" output for the idle desktop. Expected Result more than 99% of idle cpu time. Actual Result 0% idle cpu time; several gnome processes consume all available cpu cycles. Error Message(s) Test Case Workaround echo stat_force_usec_granularity/W0 | mdb -wk Additional configuration information SX:CE b111, bfu'ed to current opensolaris bits compiled from onnv-gate mercurial source repository. === *Public Comments* ======================================================== === *Comments* =============================================================== === *Evaluation* ============================================================= === *Suggested Fix* ========================================================== === *Workaround* ============================================================= === *Justification* ========================================================== Priority changed from [] to [2-High] system becomes more or less unusable after gnome desktop has started === *Additional Details* ===================================================== Targeted Release: Commit To Fix In Build: Fixed In Build: Integrated In Build: Verified In Build: See Also: 6539657 Duplicate of: Program Management: Root Cause: Is a Security Vulnerability?: No Fix Affects Documentation: No Fix Affects Localization: No Reported by: === *History* ================================================================ Date Submitted: 2009-06-20 16:48:14 GMT+00:00 Status Changed Date Updated Updated By