In r1300, I incorporated your notes, except for #2 and #3. I dug
around for a CPAN module for stack traces, but couldn't find anything.
Maybe MTT could contain a stripped down version of GDB for the sole
purpose of gathering stack traces? Or is there some other open source
"stack trace grabber" tool out there?

-Ethan

On Mon, Jun/22/2009 07:37:16AM, Jeff Squyres wrote:
> Actually, I think this would be fine for the trunk.  Some random notes:
>
> 1. It might be nice to move this logic out of the docommand sub itself and 
> into its own sub.
> 2. it would also be good to generalize the ps and gdb commands for systems 
> where those variants are not relevant
> 3. it might even be good to generally develop the backtrace functionality 
> overall -- backtraces would be really good to capture in the database...
> 4. how about having a[n optional] timeout with the sentinel file?  that is, 
> it'll send a mail, then wait another timeout (e.g., 1 hour) and if the 
> sentinel file still exists, mtt will remove the file and keep going
>
>
> On Jun 19, 2009, at 2:47 PM, Ethan Mallove wrote:
>
>> Folks,
>>
>> I came up with a feature, which does not seem quite appropriate to go
>> into the MTT trunk, but is still possibly useful for someone other
>> than me. I have posted a note about it on the MTT wiki:
>>
>>   http://svn.open-mpi.org/trac/mtt/wiki/EmailTimeoutNotification
>>
>> Here's the text of the Wiki page:
>>
>> We (Sun) were trying to track down a hang in an MPI test that we were
>> seeing in our MTT runs which was difficult to reproduce manually. The
>> problem is that MTT kills the hanging process before a developer has a
>> chance to investigate the issue. To address this, I patched an MTT
>> client (see attached patch file) to send out a notification email
>> containing an mpirun command line and GDB back trace for the hanging
>> test. A predefined sentinel file is touched, which can later be
>> removed to force MTT to move on and continue testing. Here are the INI
>> parameters to activate the timeout email notification:
>>
>>  * {{{docommand_timeout_sentinel_file}}}
>>  * {{{docommand_timeout_email_recipient}}}
>>
>> Example usage:
>>
>> {{{
>> $ client/mtt --scratch /foo/bar --file foo.ini
>>   
>> docommand_timeout_sentinel_file=/tmp/mtt-timeout-sentinel-file-\&random_string\(10\)
>>   
>> docommand_timeout_email_recipient=fred.flints...@sun.com,barney.rub...@sun.com
>> }}}
>>
>> -Ethan
>> _______________________________________________
>> mtt-devel mailing list
>> mtt-de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
>>
>
>
> -- 
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> mtt-devel mailing list
> mtt-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel

Reply via email to