Your message dated Sun, 8 Apr 2012 19:40:20 +0200
with message-id <20120408174020.GA20622@meiner>
and subject line Re: [condor-debian] Bug#667478: condor: RSS memory usage grows
continuously for Condor jobs
has caused the Debian Bug report #667478,
regarding condor: RSS memory usage grows continuously for Condor jobs
to be marked as done.
This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.
(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)
--
667478: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=667478
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Package: condor
Version: 7.7.5~dfsg.1-2
Severity: normal
Hi,
We are running a backport of the Debian package of Condor 7.7.5 from
experimental on a cluster of Debian stable machines. Since upgrading
from 7.7.4 we noticed an increased memory demand for pretty much all
jobs.
I recently ran a week-long job that starts off at 10GB size and should
not gain significant memory size throughout the process (as confirmed
with Condor 7.7.4). After the upgrade to 7.7.5 the job continuously
increases it memory demands and I have to kill it after two days when it
exceeds 150GB consumption. However, the continuous growth is not limited
to this particular job -- most type of long-running jobs on this machine
are Python-based, though.
Looking into the 7.7.5 changelog I see a number of memory-related
aspects, but nothing that is a perfect match. I checked that this is not
just about Condor reporting increasing memory consumption, but the
respective cluster nodes actually run out of memory, because the job
grows and grows.
I'd be glad to get some feedback on what the problem could be and if
there is a workaround.
Thanks.
-- System Information:
Debian Release: 6.0.4
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.32-5-amd64 (SMP w/24 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages condor depends on:
ii adduser 3.112+nmu2 add and remove users and groups
ii debconf [debconf-2 1.5.36.1 Debian configuration management sy
ii libc6 2.11.3-3 Embedded GNU C Library: Shared lib
ii libcgroup1 0.37.1-1~nd60+1 Library to control and monitor con
ii libclassad3 7.7.5~dfsg.1-2~nd60+1 library for Condor's classads expr
ii libcomerr2 1.41.12-4stable1 common error description library
ii libcurl3 7.21.0-2.1+squeeze1 Multi-protocol file transfer libra
ii libdate-manip-perl 6.11-1 module for manipulating dates
ii libexpat1 2.0.1-7 XML parsing C library - runtime li
ii libgcc1 1:4.4.5-8 GCC support library
ii libglobus-callout0 0.7-6 Globus Toolkit - Globus Callout Li
ii libglobus-common0 11.5-2 Globus Toolkit - Common Library
ii libglobus-ftp-cont 2.11-2 Globus Toolkit - GridFTP Control L
ii libglobus-gass-tra 4.3-2 Globus Toolkit - Globus Gass Trans
ii libglobus-gram-cli 10.4-1 Globus Toolkit - GRAM Client Libra
ii libglobus-gram-pro 9.7-2 Globus Toolkit - GRAM Protocol Lib
ii libglobus-gsi-call 2.7-1 Globus Toolkit - Globus GSI Callba
ii libglobus-gsi-cert 6.6-1 Globus Toolkit - Globus GSI Cert U
ii libglobus-gsi-cred 3.5-1 Globus Toolkit - Globus GSI Creden
ii libglobus-gsi-open 0.14-6 Globus Toolkit - Globus OpenSSL Er
ii libglobus-gsi-prox 4.5-1 Globus Toolkit - Globus GSI Proxy
ii libglobus-gsi-prox 2.3-1 Globus Toolkit - Globus GSI Proxy
ii libglobus-gsi-sysc 3.1-2 Globus Toolkit - Globus GSI System
ii libglobus-gss-assi 5.9-1 Globus Toolkit - GSSAPI Assist lib
ii libglobus-gssapi-e 2.5-7 Globus Toolkit - GSSAPI Error Libr
ii libglobus-gssapi-g 7.5-2 Globus Toolkit - GSSAPI library
ii libglobus-io3 6.3-8 Globus Toolkit - uniform I/O inter
ii libglobus-openssl- 1.3-1 Globus Toolkit - Globus OpenSSL Mo
ii libglobus-rsl2 7.2-2 Globus Toolkit - Resource Specific
ii libglobus-xio0 2.8-3 Globus Toolkit - Globus XIO Framew
ii libgssapi-krb5-2 1.8.3+dfsg-4squeeze5 MIT Kerberos runtime libraries - k
ii libk5crypto3 1.8.3+dfsg-4squeeze5 MIT Kerberos runtime libraries - C
ii libkrb5-3 1.8.3+dfsg-4squeeze5 MIT Kerberos runtime libraries
ii libkrb5support0 1.8.3+dfsg-4squeeze5 MIT Kerberos runtime libraries - S
ii libldap-2.4-2 2.4.23-7.2 OpenLDAP libraries
ii libltdl7 2.2.6b-2 A system independent dlopen wrappe
ii libpcre3 8.02-1.1 Perl 5 Compatible Regular Expressi
ii libssl0.9.8 0.9.8o-4squeeze7 SSL shared libraries
ii libstdc++6 4.4.5-8 The GNU Standard C++ Library v3
ii libuuid1 2.17.2-9 Universally Unique ID library
ii libvirt0 0.8.3-5+squeeze2 library for interfacing with diffe
ii libxml2 2.7.8.dfsg-2+squeeze3 GNOME XML library
ii perl 5.10.1-17squeeze3 Larry Wall's Practical Extraction
ii zlib1g 1:1.2.3.4.dfsg-3 compression library - runtime
Versions of packages condor recommends:
ii dmtcp 1.2.4-1 Checkpoint/Restart functionality f
condor suggests no packages.
--- End Message ---
--- Begin Message ---
On Wed, Apr 04, 2012 at 05:34:16PM +0200, Michael Hanke wrote:
> > You can also try running condor_ssh_to_job while a job is running to get
> > an interactive session with the same environment as your job. You can
> > examine the environment variables, etc. for any odd settings. You even
> > submit a sleep job, then use condor_ssh_to_job to start your program
> > interactively in the environment Condor sets up, possibly tweaking
> > environment variables first.
>
> I haven't done that yet, and will test this next -- thanks for this
> suggestion! I will report back if I can replicate the behavior.
I'm closing this bug as the source of the problem was actually a memory leak in
the job -- a small but critical difference between the test jobs inside
and outside of Condor made me blame the wrong component.
Sorry for the noise, I'll make it up by quickly upgrading the package to
7.7.6 ;-)
Michael
--
Michael Hanke
http://mih.voxindeserto.de
--- End Message ---