I'll look into it in the next couple of days. Thanks Edgar George Bosilca wrote: > This is an artifact of using the gatherv (or the scatterv) on an > inter-communicator without any useful data (i.e. either count of zero or > empty datatypes). Looks more like a synchronization than a real operation. > > george. > > On May 5, 2010, at 20:17 , Lisandro Dalcin wrote: > >> After building 1.4.2 with debug flags to configure, I get this (I've >> got these warnings in previous releases, too): >> >> malloc debug: Request for 0 bytes (coll_inter_gatherv.c, 94) >> malloc debug: Request for 0 bytes (coll_inter_gatherv.c, 94) >> malloc debug: Request for 0 bytes (coll_inter_gatherv.c, 94) >> malloc debug: Request for 0 bytes (coll_inter_gatherv.c, 94) >> >> malloc debug: Request for 0 bytes (coll_inter_scatterv.c, 82) >> malloc debug: Request for 0 bytes (coll_inter_scatterv.c, 82) >> malloc debug: Request for 0 bytes (coll_inter_scatterv.c, 82) >> malloc debug: Request for 0 bytes (coll_inter_scatterv.c, 82) >> >> >> -- >> Lisandro Dalcin >> --------------- >> CIMEC (INTEC/CONICET-UNL) >> Predio CONICET-Santa Fe >> Colectora RN 168 Km 472, Paraje El Pozo >> Tel: +54-342-4511594 (ext 1011) >> Tel/Fax: +54-342-4511169 >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > Return-Path: <svn-boun...@open-mpi.org> > X-Original-To: gabr...@cs.uh.edu > Delivered-To: gabr...@cs.uh.edu > Received: from localhost (dijkstra.cs.uh.edu [127.0.0.1]) > by dijkstra.cs.uh.edu (Postfix) with ESMTP id EFAA223CB74; > Thu, 6 May 2010 15:57:22 -0500 (CDT) > X-Virus-Scanned: amavisd-new at cs.uh.edu > Received: from dijkstra.cs.uh.edu ([127.0.0.1]) > by localhost (dijkstra.cs.uh.edu [127.0.0.1]) (amavisd-new, port 10024) > with ESMTP id yimyxDvtFBmi; Thu, 6 May 2010 15:57:21 -0500 (CDT) > Received: from milliways.osl.iu.edu (milliways.osl.iu.edu [129.79.245.239]) > by dijkstra.cs.uh.edu (Postfix) with ESMTP id 4508323CB70; > Thu, 6 May 2010 15:57:20 -0500 (CDT) > Received: from milliways.osl.iu.edu (localhost [127.0.0.1]) > by milliways.osl.iu.edu (8.13.1/8.13.1/IUCS_2.92) with ESMTP id > o46KvK3G020072; > Thu, 6 May 2010 16:57:20 -0400 > Received: from sourcehaven.osl.iu.edu (sourcehaven.osl.iu.edu > [129.79.245.235]) > by milliways.osl.iu.edu (8.13.1/8.13.1/IUCS_2.92) with ESMTP id > o46KvITp020066 for <s...@open-mpi.org>; Thu, 6 May 2010 16:57:18 -0400 > Received: from sourcehaven.osl.iu.edu (localhost [127.0.0.1]) > by sourcehaven.osl.iu.edu (8.13.1/8.13.1/NULLCLIENT_1.7) with ESMTP id > o46KvIjb002462 for <s...@open-mpi.org>; Thu, 6 May 2010 16:57:18 -0400 > Received: (from apache@localhost) > by sourcehaven.osl.iu.edu (8.13.1/8.13.1/Submit) id o46KvHti002438 > for s...@open-mpi.org; Thu, 6 May 2010 16:57:17 -0400 > Date: Thu, 6 May 2010 16:57:17 -0400 > Message-Id: <201005062057.o46kvhti002...@sourcehaven.osl.iu.edu> > X-Authentication-Warning: sourcehaven.osl.iu.edu: apache set sender to > r...@osl.iu.edu using -f > From: r...@osl.iu.edu > To: s...@open-mpi.org > MIME-Version: 1.0 > Subject: [OMPI svn] svn:open-mpi r23106 > X-BeenThere: s...@open-mpi.org > X-Mailman-Version: 2.1.11rc1 > Precedence: list > Reply-To: de...@open-mpi.org > List-Id: Open MPI SVN activity <svn.open-mpi.org> > List-Unsubscribe: <http://www.open-mpi.org/mailman/options.cgi/svn>, > <mailto:svn-requ...@open-mpi.org?subject=unsubscribe> > List-Post: <mailto:s...@open-mpi.org> > List-Help: <mailto:svn-requ...@open-mpi.org?subject=help> > List-Subscribe: <http://www.open-mpi.org/mailman/listinfo.cgi/svn>, > <mailto:svn-requ...@open-mpi.org?subject=subscribe> > Content-Type: text/plain; charset="us-ascii" > Content-Transfer-Encoding: 7bit > Sender: svn-boun...@open-mpi.org > Errors-To: svn-boun...@open-mpi.org > Status: O > X-UID: 88090 > Content-Length: 1900 > X-Keywords: > > > Author: rhc > Date: 2010-05-06 16:57:17 EDT (Thu, 06 May 2010) > New Revision: 23106 > URL: https://svn.open-mpi.org/trac/ompi/changeset/23106 > > Log: > More cleanup on paffinity....groan > > It is okay to not have a paffinity module IF you aren't using paffinity > anyway. So don't error out of MPI_Init because a paffinity module wasn't > selected. > > Cleanup error reporting in the odls default module to (once and for all!) > eliminate messages originating in the fork'd process. Create some new error > codes to allow us to pass enough info back to the parent process to provide > useful error messages. > > > Text files modified: > trunk/opal/include/opal/constants.h | 63 > +++++++----- > trunk/opal/mca/paffinity/base/paffinity_base_select.c | 17 +- > > trunk/opal/mca/paffinity/base/paffinity_base_service.c | 72 > +++---------- > trunk/opal/mca/paffinity/base/paffinity_base_wrappers.c | 20 +- > > trunk/opal/runtime/opal_init.c | 32 +++++ > > trunk/orte/include/orte/constants.h | 79 > ++++++++------ > trunk/orte/mca/odls/default/help-odls-default.txt | 34 ++++- > > trunk/orte/mca/odls/default/odls_default_module.c | 206 > ++++++++++++++++----------------------- > trunk/orte/util/error_strings.c | 3 > > 9 files changed, 256 insertions(+), 270 deletions(-) > > > Diff not shown due to size (56064 bytes). > To see the diff, run the following command: > > svn diff -r 23105:23106 --no-diff-deleted > > _______________________________________________ > svn mailing list > s...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/svn > Return-Path: <wes...@hlrs.de> > X-Original-To: gabr...@cs.uh.edu > Delivered-To: gabr...@cs.uh.edu > Received: from localhost (dijkstra.cs.uh.edu [127.0.0.1]) > by dijkstra.cs.uh.edu (Postfix) with ESMTP id 0482623CB76 > for <gabr...@cs.uh.edu>; Thu, 6 May 2010 16:12:34 -0500 (CDT) > X-Virus-Scanned: amavisd-new at cs.uh.edu > Received: from dijkstra.cs.uh.edu ([127.0.0.1]) > by localhost (dijkstra.cs.uh.edu [127.0.0.1]) (amavisd-new, port 10024) > with ESMTP id qeP868W347GH for <gabr...@cs.uh.edu>; > Thu, 6 May 2010 16:12:30 -0500 (CDT) > Received: from mail.hlrs.de (mail.hlrs.de [141.58.2.50]) > by dijkstra.cs.uh.edu (Postfix) with ESMTP id 4F0CB23CB0B > for <gabr...@cs.uh.edu>; Thu, 6 May 2010 16:12:30 -0500 (CDT) > Received: from localhost (localhost [127.0.0.1]) > by mail.hlrs.de (Postfix) with ESMTP id C959F680A01A; > Thu, 6 May 2010 23:12:26 +0200 (CEST) > X-Virus-Scanned: amavisd-new at hlrs.de > Received: from mail.hlrs.de ([127.0.0.1]) > by localhost (mail.hlrs.de [127.0.0.1]) (amavisd-new, port 10024) > with ESMTP id tcgSAU+RQ0RV; Thu, 6 May 2010 23:12:26 +0200 (CEST) > Received: from [192.168.121.3] (unknown [213.178.173.108]) > by mail.hlrs.de (Postfix) with ESMTPSA id 3DC3E680A016; > Thu, 6 May 2010 23:12:25 +0200 (CEST) > Subject: Re: EuroMPI2010 > Mime-Version: 1.0 (Apple Message framework v1078) > Content-Type: text/plain; charset=iso-8859-1 > From: Stefan Wesner <wes...@hlrs.de> > In-Reply-To: <9971474.4448.1273174128565.JavaMail.root@epsilon> > Date: Thu, 6 May 2010 23:12:22 +0200 > Cc: Edgar Gabriel <gabr...@cs.uh.edu>, > Edgar Gabriel <egabr...@uh.edu>, > Rainer Keller <kel...@hlrs.de>, > EuroMPI2010 <eurompi2...@easychair.org> > Content-Transfer-Encoding: quoted-printable > Message-Id: <5d05a0b7-9af7-46d1-b6eb-66670cf32...@hlrs.de> > References: <9971474.4448.1273174128565.JavaMail.root@epsilon> > To: Rolf Rabenseifner <rabenseif...@hlrs.de> > X-Mailer: Apple Mail (2.1078) > Status: O > X-UID: 88091 > Content-Length: 9229 > X-Keywords: > > > Hi, > > warum auch immer ist als forward ermail adresse kel...@hlrs.de = > konfiguriert... > > Stefan. > -- > Dr.-Ing. Stefan Wesner > Deputy Director, Head of Applications & Visualization > High Performance Computing Center of University Stuttgart > Nobelstrasse 19, D-70569 Stuttgart, Germany > Phone: +49 711-685 6 4275 > Mobile: +49 172 1354054 > Fax: +49 711-685 5 4275 > > > On 06.05.2010, at 21:28, Rolf Rabenseifner wrote: > >> Hallo Edgar, >> =20 >> Cool, ich habe soeben Review-Paper-20 hochgeladen. >> Es funktioniert sogar ohne neues Einloggen. >> Bekommst Du nun auch die Mails an eurompi2...@easychair.org ? >> =20 >> Besten Dank. >> Rolf >> =20 >> ----- Original Message ----- >>> ich glaube ich sehe das Problem: >>> =20 >>> Bei der Konfiguration gibt es den Menupunkt 'Can non-chairs add or >>> modify reviews', und das war auf 'no' gesetzt. Ich habe es jetzt mal >>> auf 'yes' gesetzt, mich als Rolf eingeloggt, und jetzt ist der >>> Menupunkt zum >>> uploaden der papers da. >>> =20 >>> Viele Gruesse >>> Edgar >>> =20 >>> Rolf Rabenseifner wrote: >>>> Danke Edgar, >>>> =20 >>>> Und Ihr solltet vielleicht auch schauen, ob noch weitere Mails an >>>> eurompi2...@easychair.org einfach unbeantwortet blieben, >>>> oder ob meine die einzige war. >>>> Und warum sie bei Dir, Edgar nicht ankommen. >>>> =20 >>>> Viele Gr=FC=DFe >>>> Rolf >>>> =20 >>>> ----- Original Message ----- >>>>> yep, ich habe weitere Menupunkte. >>>>> =20 >>>>> Ich habe mich aber als ein anderes Mitglied des Program Kommittees >>>>> eingeloggt, und der Punkt scheint in der Tat zu fehlen. >>>>> Rainer/Stefan, da Ihr den Premium service fuer Easychair bezahlt >>>>> habt, kann einer von >>>>> euch mal die Leute anpingen was da falsch ist? >>>>> =20 >>>>> Viele Gruesse >>>>> Edgar >>>>> =20 >>>>> Rolf Rabenseifner wrote: >>>>>> Hallo Edgar, >>>>>> =20 >>>>>> danke bzgl. Pap23. >>>>>> =20 >>>>>> Bzgl. Upload: >>>>>> =20 >>>>>> Diesen Menue-Punkt gibt es bei mir nicht!!!!!!!!! >>>>>> =20 >>>>>> Im Menue "Reviews" gibt es als Pop-up und als Webpage folgende >>>>>> Punkte: >>>>>> http://www.easychair.org/conferences/review.cgi?a=3Dp0266d40517a >>>>>> =20 >>>>>> Reviews >>>>>> Select one of the following options. >>>>>> =20 >>>>>> - Reviews on papers assigned to me >>>>>> - Download offline review forms >>>>>> - Subreviewers >>>>>> =20 >>>>>> Sieht die Seite bei Dir anders aus? >>>>>> =20 >>>>>> Sch=F6ne Gr=FC=DFe >>>>>> Rolf >>>>>> =20 >>>>>> ----- Original Message ----- >>>>>>> Hallo Rolf, >>>>>>> =20 >>>>>>> Rolf Rabenseifner wrote: >>>>>>>> Hallo Stefan, Rainer und Edgar, >>>>>>>> =20 >>>>>>>> ich wei=DF nicht, ob meine Mails an eurompi2...@easychair.org >>>>>>>> wirklich irgendwo ankommen - daher nun auch direkt an Euch. >>>>>>> hm, ich habe um ehrlich zu sein keine email gesehen von Dir. Tut >>>>>>> die eurompi2...@easychair.org wirklich die emails an uns >>>>>>> verschicken, oder >>>>>>> wohing gehen sie? Ich sehe auch nichts in den spam filtern. >>>>>>> =20 >>>>>>>> 2 Probleme: >>>>>>>> =20 >>>>>>>> - bei >>>>>>>> http://www.easychair.org/conferences/review.cgi?a=3Dp0266d40517a >>>>>>>> d.h. unter "Reviews" bzw. "My papers" finde ich keine >>>>>>>> M=F6glichkeit meinen Review abzuliefern, d.h. diese >>>>>>>> reviews_form.txt files. >>>>>>>> Falls ich nicht nur zu dumm bin, die offensichtliche Stelle >>>>>>>> sofort zu sehen, dann sollte dieses Problem Eurerseits >>>>>>>> m=F6glichst schnell gel=F6st werden, da es dann wahrscheinlich = > das >>>>>>>> gesmte Program Committee betrifft. >>>>>>> Ich habe gerade probiert, wenn Du bei Revies auf den Knopf >>>>>>> 'Upload reviews' gehst, kannst Du die Form hochladen. Das scheint >>>>>>> zu tun. >>>>>>> =20 >>>>>>>> - Das Paper 23 ist immernoch nicht in meiner "Reviews-->my >>>>>>>> papers" >>>>>>>> Liste wieder sichtbar, siehe angeh=E4ngte Mail, zu der ich nie >>>>>>>> eine Antwort bekam. >>>>>>>> (H=E4tte sie vielleicht in Deutsch schreiben sollen, >>>>>>>> aber ich wei=DF nicht, wer alles auf eurompi2...@easychair.org >>>>>>>> eingetragen ist.) >>>>>>> done, paper 23 ist zusaetzlich fuer Dich eingetragen. >>>>>>> Normalerweise protestieren wir nicht wenn jemand freiwillig mehr >>>>>>> Arbeit leisten >>>>>>> moechte. >>>>>>> =20 >>>>>>> Viele Gruesse >>>>>>> Edgar >>>>>>> =20 >>>>>>> =20 >>>>>>>> Sch=F6ne Gr=FC=DFe >>>>>>>> Rolf >>>>>>>> =20 >>>>>>>> =20 >>>>>>>> ----- Original Message ----- >>>>>>>>> Hi all, >>>>>>>>> =20 >>>>>>>>> I did not receive any answer within the last 3 days. >>>>>>>>> I expect that you are in a state where you are not >>>>>>>>> able to make further changes without problems. >>>>>>>>> Therefore, I'll review the currently assigned papers 20 and 22. >>>>>>>>> Please assign also the paper 23 to me because I've >>>>>>>>> already done parts of the review. >>>>>>>>> =20 >>>>>>>>> I need the decision because I'm flying many hours >>>>>>>>> next week, which is always a good time for reviewing. >>>>>>>>> =20 >>>>>>>>> Best regards and happy weekend >>>>>>>>> Rolf >>>>>>>>> =20 >>>>>>>>> ----- Original Message (Apr. 27) ----- >>>>>>>>>> Hi Stefan, Rainer, Edgar, >>>>>>>>>> =20 >>>>>>>>>> I started already yesterday to review paper 23. >>>>>>>>>> Yesterday, I was assigned to papers 22, 23, and 24. >>>>>>>>>> =20 >>>>>>>>>> Papers in my main area of expertise are: >>>>>>>>>> - 23 (area of my PhD and work in last 3 years), >>>>>>>>>> - 2, 10 (related to my work of optimization of >>>>>>>>>> collective reduction operations). >>>>>>>>>> =20 >>>>>>>>>> An additional conflict is paper 21, because Rainer is >>>>>>>>>> a direct colleague at HLRS. I entered this conflict into >>>>>>>>>> EasyChair database and therefore, the paper was removed >>>>>>>>>> from my review list. >>>>>>>>>> =20 >>>>>>>>>> It would be nice, >>>>>>>>>> - if I can get back paper 23 for review. >>>>>>>>>> - if you can substitute papers 20+22 by 2+10. >>>>>>>>>> =20 >>>>>>>>>> Best regards >>>>>>>>>> Rolf >>>>>>>>>> =20 >>>>>>>>>> =20 >>>>>>>>>> ----- Original Message ----- >>>>>>>>>>> Dear Rolf, >>>>>>>>>>> the papers of the EuroMPI2010 conference have been assigned >>>>>>>>>>> to the PC members. Please make sure, that the automatic >>>>>>>>>>> conflict detection worked fine. >>>>>>>>>>> =20 >>>>>>>>>>> We kindly ask you to please log into EasyChair >>>>>>>>>>> (http://www.easychair.org) to check & download Your assigned >>>>>>>>>>> papers and notify us of further conflicts by >>>>>>>>>>> 30th of April. >>>>>>>>>>> =20 >>>>>>>>>>> =20 >>>>>>>>>>> We would like to encourage a discussion on on papers where >>>>>>>>>>> the reviews show different opinions. >>>>>>>>>>> Therefore we would like to ask You to please submit Your >>>>>>>>>>> review by 12th of May >>>>>>>>>>> to be able to make the deadline for the notification of >>>>>>>>>>> authors on 20th of May. >>>>>>>>>>> If you cannot proceed with the review or this is not >>>>>>>>>>> convenient for you please do not hesitate to contact us and >>>>>>>>>>> we will submit >>>>>>>>>>> the papers to another reviewer. >>>>>>>>>>> =20 >>>>>>>>>>> The review process will be open but anonymous -- You should >>>>>>>>>>> be able to see other reviewers input, after you have >>>>>>>>>>> submitted your review. If you have any questions, please do >>>>>>>>>>> not hesitate >>>>>>>>>>> to contact us. >>>>>>>>>>> =20 >>>>>>>>>>> Thank you very much. >>>>>>>>>>> =20 >>>>>>>>>>> Best regards, >>>>>>>>>>> the Program Chairs of EuroMPI 2010. >>>>>>>>>> -- Dr. Rolf Rabenseifner . . . . . . . . . .. email >>>>>>>>>> rabenseif...@hlrs.de High Performance Computing Center (HLRS) >>>>>>>>>> . phone ++49(0)711/685-65530 University of Stuttgart . . . . . >>>>>>>>>> . . >>>>>>>>>> . .. fax >>>>>>>>>> ++49(0)711 / 685-65832 >>>>>>>>>> Head of Dpmt Parallel Computing . . . >>>>>>>>>> www.hlrs.de/people/rabenseifner Nobelstr. 19, D-70550 >>>>>>>>>> Stuttgart, Germany . (Office: Allmandring 30) >>>>>>>>> -- Dr. Rolf Rabenseifner . . . . . . . . . .. email >>>>>>>>> rabenseif...@hlrs.de High Performance Computing Center (HLRS) . >>>>>>>>> phone ++49(0)711/685-65530 University of Stuttgart . . . . . . >>>>>>>>> . . .. fax >>>>>>>>> ++49(0)711 / 685-65832 >>>>>>>>> Head of Dpmt Parallel Computing . . . >>>>>>>>> www.hlrs.de/people/rabenseifner Nobelstr. 19, D-70550 >>>>>>>>> Stuttgart, Germany . (Office: Allmandring 30) >>>>>>> -- Edgar Gabriel >>>>>>> Assistant Professor >>>>>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu >>>>>>> Department of Computer Science University of Houston >>>>>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA >>>>>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 >>>>> -- Edgar Gabriel >>>>> Assistant Professor >>>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu >>>>> Department of Computer Science University of Houston >>>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA >>>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 >>>> =20 >>> =20 >>> -- Edgar Gabriel >>> Assistant Professor >>> Parallel Software Technologies Lab http://pstl.cs.uh.edu >>> Department of Computer Science University of Houston >>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA >>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 >> =20 >> --=20 >> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseif...@hlrs.de >> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 >> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 >> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner >> Nobelstr. 19, D-70550 Stuttgart, Germany . (Office: Allmandring 30) > Return-Path: <wes...@hlrs.de> > X-Original-To: gabr...@cs.uh.edu > Delivered-To: gabr...@cs.uh.edu > Received: from localhost (dijkstra.cs.uh.edu [127.0.0.1]) > by dijkstra.cs.uh.edu (Postfix) with ESMTP id BDAAA23CB0B > for <gabr...@cs.uh.edu>; Thu, 6 May 2010 16:12:34 -0500 (CDT) > X-Virus-Scanned: amavisd-new at cs.uh.edu > Received: from dijkstra.cs.uh.edu ([127.0.0.1]) > by localhost (dijkstra.cs.uh.edu [127.0.0.1]) (amavisd-new, port 10024) > with ESMTP id f04bIjpkp-Kp for <gabr...@cs.uh.edu>; > Thu, 6 May 2010 16:12:32 -0500 (CDT) > Received: from smtp3.cc.uh.edu (smtp3.cc.uh.edu [129.7.234.210]) > by dijkstra.cs.uh.edu (Postfix) with ESMTP id B40A123CB53 > for <gabr...@cs.uh.edu>; Thu, 6 May 2010 16:12:32 -0500 (CDT) > Received: from smtp3.cc.uh.edu (smtp3.cc.uh.edu [127.0.0.1]) > by localhost (Postfix) with SMTP id 9FF5455F024A > for <gabr...@cs.uh.edu>; Thu, 6 May 2010 16:12:32 -0500 (CDT) > Received: from mail.hlrs.de (mail.hlrs.de [141.58.2.50]) > by smtp3.cc.uh.edu (Postfix) with ESMTP id DA68455F0246 > for <egabr...@uh.edu>; Thu, 6 May 2010 16:12:31 -0500 (CDT) > Received: from localhost (localhost [127.0.0.1]) > by mail.hlrs.de (Postfix) with ESMTP id C959F680A01A; > Thu, 6 May 2010 23:12:26 +0200 (CEST) > X-Virus-Scanned: amavisd-new at hlrs.de > Received: from mail.hlrs.de ([127.0.0.1]) > by localhost (mail.hlrs.de [127.0.0.1]) (amavisd-new, port 10024) > with ESMTP id tcgSAU+RQ0RV; Thu, 6 May 2010 23:12:26 +0200 (CEST) > Received: from [192.168.121.3] (unknown [213.178.173.108]) > by mail.hlrs.de (Postfix) with ESMTPSA id 3DC3E680A016; > Thu, 6 May 2010 23:12:25 +0200 (CEST) > Subject: Re: EuroMPI2010 > Mime-Version: 1.0 (Apple Message framework v1078) > Content-Type: text/plain; charset=iso-8859-1 > From: Stefan Wesner <wes...@hlrs.de> > In-Reply-To: <9971474.4448.1273174128565.JavaMail.root@epsilon> > Date: Thu, 6 May 2010 23:12:22 +0200 > Cc: Edgar Gabriel <gabr...@cs.uh.edu>, > Edgar Gabriel <egabr...@uh.edu>, > Rainer Keller <kel...@hlrs.de>, > EuroMPI2010 <eurompi2...@easychair.org> > Content-Transfer-Encoding: quoted-printable > Message-Id: <5d05a0b7-9af7-46d1-b6eb-66670cf32...@hlrs.de> > References: <9971474.4448.1273174128565.JavaMail.root@epsilon> > To: Rolf Rabenseifner <rabenseif...@hlrs.de> > X-Mailer: Apple Mail (2.1078) > X-PMX-Version: 5.5.9.395186, Antispam-Engine: 2.7.2.376379, Antispam-Data: > 2010.5.6.205714 > Status: O > X-UID: 88092 > Content-Length: 9229 > X-Keywords: > > > Hi, > > warum auch immer ist als forward ermail adresse kel...@hlrs.de = > konfiguriert... > > Stefan. > -- > Dr.-Ing. Stefan Wesner > Deputy Director, Head of Applications & Visualization > High Performance Computing Center of University Stuttgart > Nobelstrasse 19, D-70569 Stuttgart, Germany > Phone: +49 711-685 6 4275 > Mobile: +49 172 1354054 > Fax: +49 711-685 5 4275 > > > On 06.05.2010, at 21:28, Rolf Rabenseifner wrote: > >> Hallo Edgar, >> =20 >> Cool, ich habe soeben Review-Paper-20 hochgeladen. >> Es funktioniert sogar ohne neues Einloggen. >> Bekommst Du nun auch die Mails an eurompi2...@easychair.org ? >> =20 >> Besten Dank. >> Rolf >> =20 >> ----- Original Message ----- >>> ich glaube ich sehe das Problem: >>> =20 >>> Bei der Konfiguration gibt es den Menupunkt 'Can non-chairs add or >>> modify reviews', und das war auf 'no' gesetzt. Ich habe es jetzt mal >>> auf 'yes' gesetzt, mich als Rolf eingeloggt, und jetzt ist der >>> Menupunkt zum >>> uploaden der papers da. >>> =20 >>> Viele Gruesse >>> Edgar >>> =20 >>> Rolf Rabenseifner wrote: >>>> Danke Edgar, >>>> =20 >>>> Und Ihr solltet vielleicht auch schauen, ob noch weitere Mails an >>>> eurompi2...@easychair.org einfach unbeantwortet blieben, >>>> oder ob meine die einzige war. >>>> Und warum sie bei Dir, Edgar nicht ankommen. >>>> =20 >>>> Viele Gr=FC=DFe >>>> Rolf >>>> =20 >>>> ----- Original Message ----- >>>>> yep, ich habe weitere Menupunkte. >>>>> =20 >>>>> Ich habe mich aber als ein anderes Mitglied des Program Kommittees >>>>> eingeloggt, und der Punkt scheint in der Tat zu fehlen. >>>>> Rainer/Stefan, da Ihr den Premium service fuer Easychair bezahlt >>>>> habt, kann einer von >>>>> euch mal die Leute anpingen was da falsch ist? >>>>> =20 >>>>> Viele Gruesse >>>>> Edgar >>>>> =20 >>>>> Rolf Rabenseifner wrote: >>>>>> Hallo Edgar, >>>>>> =20 >>>>>> danke bzgl. Pap23. >>>>>> =20 >>>>>> Bzgl. Upload: >>>>>> =20 >>>>>> Diesen Menue-Punkt gibt es bei mir nicht!!!!!!!!! >>>>>> =20 >>>>>> Im Menue "Reviews" gibt es als Pop-up und als Webpage folgende >>>>>> Punkte: >>>>>> http://www.easychair.org/conferences/review.cgi?a=3Dp0266d40517a >>>>>> =20 >>>>>> Reviews >>>>>> Select one of the following options. >>>>>> =20 >>>>>> - Reviews on papers assigned to me >>>>>> - Download offline review forms >>>>>> - Subreviewers >>>>>> =20 >>>>>> Sieht die Seite bei Dir anders aus? >>>>>> =20 >>>>>> Sch=F6ne Gr=FC=DFe >>>>>> Rolf >>>>>> =20 >>>>>> ----- Original Message ----- >>>>>>> Hallo Rolf, >>>>>>> =20 >>>>>>> Rolf Rabenseifner wrote: >>>>>>>> Hallo Stefan, Rainer und Edgar, >>>>>>>> =20 >>>>>>>> ich wei=DF nicht, ob meine Mails an eurompi2...@easychair.org >>>>>>>> wirklich irgendwo ankommen - daher nun auch direkt an Euch. >>>>>>> hm, ich habe um ehrlich zu sein keine email gesehen von Dir. Tut >>>>>>> die eurompi2...@easychair.org wirklich die emails an uns >>>>>>> verschicken, oder >>>>>>> wohing gehen sie? Ich sehe auch nichts in den spam filtern. >>>>>>> =20 >>>>>>>> 2 Probleme: >>>>>>>> =20 >>>>>>>> - bei >>>>>>>> http://www.easychair.org/conferences/review.cgi?a=3Dp0266d40517a >>>>>>>> d.h. unter "Reviews" bzw. "My papers" finde ich keine >>>>>>>> M=F6glichkeit meinen Review abzuliefern, d.h. diese >>>>>>>> reviews_form.txt files. >>>>>>>> Falls ich nicht nur zu dumm bin, die offensichtliche Stelle >>>>>>>> sofort zu sehen, dann sollte dieses Problem Eurerseits >>>>>>>> m=F6glichst schnell gel=F6st werden, da es dann wahrscheinlich = > das >>>>>>>> gesmte Program Committee betrifft. >>>>>>> Ich habe gerade probiert, wenn Du bei Revies auf den Knopf >>>>>>> 'Upload reviews' gehst, kannst Du die Form hochladen. Das scheint >>>>>>> zu tun. >>>>>>> =20 >>>>>>>> - Das Paper 23 ist immernoch nicht in meiner "Reviews-->my >>>>>>>> papers" >>>>>>>> Liste wieder sichtbar, siehe angeh=E4ngte Mail, zu der ich nie >>>>>>>> eine Antwort bekam. >>>>>>>> (H=E4tte sie vielleicht in Deutsch schreiben sollen, >>>>>>>> aber ich wei=DF nicht, wer alles auf eurompi2...@easychair.org >>>>>>>> eingetragen ist.) >>>>>>> done, paper 23 ist zusaetzlich fuer Dich eingetragen. >>>>>>> Normalerweise protestieren wir nicht wenn jemand freiwillig mehr >>>>>>> Arbeit leisten >>>>>>> moechte. >>>>>>> =20 >>>>>>> Viele Gruesse >>>>>>> Edgar >>>>>>> =20 >>>>>>> =20 >>>>>>>> Sch=F6ne Gr=FC=DFe >>>>>>>> Rolf >>>>>>>> =20 >>>>>>>> =20 >>>>>>>> ----- Original Message ----- >>>>>>>>> Hi all, >>>>>>>>> =20 >>>>>>>>> I did not receive any answer within the last 3 days. >>>>>>>>> I expect that you are in a state where you are not >>>>>>>>> able to make further changes without problems. >>>>>>>>> Therefore, I'll review the currently assigned papers 20 and 22. >>>>>>>>> Please assign also the paper 23 to me because I've >>>>>>>>> already done parts of the review. >>>>>>>>> =20 >>>>>>>>> I need the decision because I'm flying many hours >>>>>>>>> next week, which is always a good time for reviewing. >>>>>>>>> =20 >>>>>>>>> Best regards and happy weekend >>>>>>>>> Rolf >>>>>>>>> =20 >>>>>>>>> ----- Original Message (Apr. 27) ----- >>>>>>>>>> Hi Stefan, Rainer, Edgar, >>>>>>>>>> =20 >>>>>>>>>> I started already yesterday to review paper 23. >>>>>>>>>> Yesterday, I was assigned to papers 22, 23, and 24. >>>>>>>>>> =20 >>>>>>>>>> Papers in my main area of expertise are: >>>>>>>>>> - 23 (area of my PhD and work in last 3 years), >>>>>>>>>> - 2, 10 (related to my work of optimization of >>>>>>>>>> collective reduction operations). >>>>>>>>>> =20 >>>>>>>>>> An additional conflict is paper 21, because Rainer is >>>>>>>>>> a direct colleague at HLRS. I entered this conflict into >>>>>>>>>> EasyChair database and therefore, the paper was removed >>>>>>>>>> from my review list. >>>>>>>>>> =20 >>>>>>>>>> It would be nice, >>>>>>>>>> - if I can get back paper 23 for review. >>>>>>>>>> - if you can substitute papers 20+22 by 2+10. >>>>>>>>>> =20 >>>>>>>>>> Best regards >>>>>>>>>> Rolf >>>>>>>>>> =20 >>>>>>>>>> =20 >>>>>>>>>> ----- Original Message ----- >>>>>>>>>>> Dear Rolf, >>>>>>>>>>> the papers of the EuroMPI2010 conference have been assigned >>>>>>>>>>> to the PC members. Please make sure, that the automatic >>>>>>>>>>> conflict detection worked fine. >>>>>>>>>>> =20 >>>>>>>>>>> We kindly ask you to please log into EasyChair >>>>>>>>>>> (http://www.easychair.org) to check & download Your assigned >>>>>>>>>>> papers and notify us of further conflicts by >>>>>>>>>>> 30th of April. >>>>>>>>>>> =20 >>>>>>>>>>> =20 >>>>>>>>>>> We would like to encourage a discussion on on papers where >>>>>>>>>>> the reviews show different opinions. >>>>>>>>>>> Therefore we would like to ask You to please submit Your >>>>>>>>>>> review by 12th of May >>>>>>>>>>> to be able to make the deadline for the notification of >>>>>>>>>>> authors on 20th of May. >>>>>>>>>>> If you cannot proceed with the review or this is not >>>>>>>>>>> convenient for you please do not hesitate to contact us and >>>>>>>>>>> we will submit >>>>>>>>>>> the papers to another reviewer. >>>>>>>>>>> =20 >>>>>>>>>>> The review process will be open but anonymous -- You should >>>>>>>>>>> be able to see other reviewers input, after you have >>>>>>>>>>> submitted your review. If you have any questions, please do >>>>>>>>>>> not hesitate >>>>>>>>>>> to contact us. >>>>>>>>>>> =20 >>>>>>>>>>> Thank you very much. >>>>>>>>>>> =20 >>>>>>>>>>> Best regards, >>>>>>>>>>> the Program Chairs of EuroMPI 2010. >>>>>>>>>> -- Dr. Rolf Rabenseifner . . . . . . . . . .. email >>>>>>>>>> rabenseif...@hlrs.de High Performance Computing Center (HLRS) >>>>>>>>>> . phone ++49(0)711/685-65530 University of Stuttgart . . . . . >>>>>>>>>> . . >>>>>>>>>> . .. fax >>>>>>>>>> ++49(0)711 / 685-65832 >>>>>>>>>> Head of Dpmt Parallel Computing . . . >>>>>>>>>> www.hlrs.de/people/rabenseifner Nobelstr. 19, D-70550 >>>>>>>>>> Stuttgart, Germany . (Office: Allmandring 30) >>>>>>>>> -- Dr. Rolf Rabenseifner . . . . . . . . . .. email >>>>>>>>> rabenseif...@hlrs.de High Performance Computing Center (HLRS) . >>>>>>>>> phone ++49(0)711/685-65530 University of Stuttgart . . . . . . >>>>>>>>> . . .. fax >>>>>>>>> ++49(0)711 / 685-65832 >>>>>>>>> Head of Dpmt Parallel Computing . . . >>>>>>>>> www.hlrs.de/people/rabenseifner Nobelstr. 19, D-70550 >>>>>>>>> Stuttgart, Germany . (Office: Allmandring 30) >>>>>>> -- Edgar Gabriel >>>>>>> Assistant Professor >>>>>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu >>>>>>> Department of Computer Science University of Houston >>>>>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA >>>>>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 >>>>> -- Edgar Gabriel >>>>> Assistant Professor >>>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu >>>>> Department of Computer Science University of Houston >>>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA >>>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 >>>> =20 >>> =20 >>> -- Edgar Gabriel >>> Assistant Professor >>> Parallel Software Technologies Lab http://pstl.cs.uh.edu >>> Department of Computer Science University of Houston >>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA >>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 >> =20 >> --=20 >> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseif...@hlrs.de >> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 >> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 >> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner >> Nobelstr. 19, D-70550 Stuttgart, Germany . (Office: Allmandring 30) > Return-Path: <users-boun...@open-mpi.org> > X-Original-To: gabr...@cs.uh.edu > Delivered-To: gabr...@cs.uh.edu > Received: from localhost (dijkstra.cs.uh.edu [127.0.0.1]) > by dijkstra.cs.uh.edu (Postfix) with ESMTP id 1792423CB6D > for <gabr...@cs.uh.edu>; Thu, 6 May 2010 16:29:05 -0500 (CDT) > X-Virus-Scanned: amavisd-new at cs.uh.edu > Received: from dijkstra.cs.uh.edu ([127.0.0.1]) > by localhost (dijkstra.cs.uh.edu [127.0.0.1]) (amavisd-new, port 10024) > with ESMTP id r4MtRA9MloZy for <gabr...@cs.uh.edu>; > Thu, 6 May 2010 16:29:03 -0500 (CDT) > Received: from milliways.osl.iu.edu (milliways.osl.iu.edu [129.79.245.239]) > by dijkstra.cs.uh.edu (Postfix) with ESMTP id E924823CB5A > for <gabr...@cs.uh.edu>; Thu, 6 May 2010 16:29:02 -0500 (CDT) > Received: from milliways.osl.iu.edu (localhost [127.0.0.1]) > by milliways.osl.iu.edu (8.13.1/8.13.1/IUCS_2.92) with ESMTP id > o46LSlDr022281; > Thu, 6 May 2010 17:28:49 -0400 > Received: from mail1.ldeo.columbia.edu (mail1.ldeo.columbia.edu > [129.236.19.100]) > by milliways.osl.iu.edu (8.13.1/8.13.1/IUCS_2.92) with ESMTP id > o46LSgTQ022276 > for <us...@open-mpi.org>; Thu, 6 May 2010 17:28:46 -0400 > Received: from claudius.ldeo.columbia.edu (claudius.ldgo.columbia.edu > [129.236.21.127]) (user=gus mech=PLAIN bits=0) > by mail1.ldeo.columbia.edu (8.14.3/8.14.3/MAIL-LDEO-1.9) with ESMTP id > o46LSg0a001054 > (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) > for <us...@open-mpi.org>; Thu, 6 May 2010 17:28:42 -0400 (EDT) > Message-ID: <4be33485.9090...@ldeo.columbia.edu> > Date: Thu, 06 May 2010 17:28:37 -0400 > From: Gus Correa <g...@ldeo.columbia.edu> > User-Agent: Thunderbird 2.0.0.23 (X11/20090825) > MIME-Version: 1.0 > To: Open MPI Users <us...@open-mpi.org> > References: <4be08f2a.6000...@ldeo.columbia.edu> > <441acf3b-34a9-4ff5-b78e-b9a8df4e8...@cisco.com> > <4be09531.9040...@ldeo.columbia.edu> > <0d4abfdd-9802-4d77-bf70-c7ec3198f...@open-mpi.org> > <4be0a505.2000...@ldeo.columbia.edu> <4be0cb62.7080...@ldeo.columbia.edu> > <10b2585f-576a-4b18-a83e-e8e165823...@cisco.com> > <4be1ab3a.4010...@ldeo.columbia.edu> > <9a3fcc9c-56de-4de4-a781-460ccc083...@open-mpi.org> > <4be1ec79.3030...@ldeo.columbia.edu> <20100505235456.GA5622@sopalepc> > <7dc1d35d-11c6-4f4b-870a-031ff11f7...@open-mpi.org> > <4be2d427.4090...@oracle.com> <4be2f269.1090...@ldeo.columbia.edu> > <4be2f857.9090...@oracle.com> > <4be303ff.4020...@ldeo.columbia.edu> > <a70200b0-1eba-4212-a0d6-22cb34053...@lanl.gov> > In-Reply-To: <a70200b0-1eba-4212-a0d6-22cb34053...@lanl.gov> > X-Scanned-By: MIMEDefang 2.64 on 129.236.19.105 > X-PMX-Version: 5.5.9.388399, Antispam-Engine: 2.7.2.376379, > Antispam-Data: 2010.5.6.211515 > X-PerlMx-Spam: Gauge=X, Probability=10%, Report=' > TO_IN_SUBJECT 0.5, BODY_SIZE_6000_6999 0, BODY_SIZE_7000_LESS 0, > __BOUNCE_CHALLENGE_SUBJ 0, __BOUNCE_NDR_SUBJ_EXEMPT 0, > __CP_URI_IN_BODY 0, __CT 0, __CTE 0, __CT_TEXT_PLAIN 0, > __HAS_MSGID 0, __MIME_TEXT_ONLY 0, __MIME_VERSION 0, > __MOZILLA_MSGID 0, __SANE_MSGID 0, __TO_MALFORMED_2 0, __URI_NS , > __USER_AGENT 0' > Subject: Re: [OMPI users] How do I run OpenMPI safely on > a Nehalem standalone machine? > X-BeenThere: us...@open-mpi.org > X-Mailman-Version: 2.1.11rc1 > Precedence: list > Reply-To: Open MPI Users <us...@open-mpi.org> > List-Id: Open MPI Users <users.open-mpi.org> > List-Unsubscribe: <http://www.open-mpi.org/mailman/options.cgi/users>, > <mailto:users-requ...@open-mpi.org?subject=unsubscribe> > List-Archive: <http://www.open-mpi.org/MailArchives/users> > List-Post: <mailto:us...@open-mpi.org> > List-Help: <mailto:users-requ...@open-mpi.org?subject=help> > List-Subscribe: <http://www.open-mpi.org/mailman/listinfo.cgi/users>, > <mailto:users-requ...@open-mpi.org?subject=subscribe> > Content-Transfer-Encoding: 7bit > Content-Type: text/plain; charset="us-ascii"; Format="flowed" > Sender: users-boun...@open-mpi.org > Errors-To: users-boun...@open-mpi.org > Status: O > X-UID: 88093 > Content-Length: 6268 > X-Keywords: > > > Hi Samuel > > Samuel K. Gutierrez wrote: >> Hi Gus, >> >> This may not help, but it's worth a try. If it's not too much trouble, >> can you please reconfigure your Open MPI installation with >> --enable-debug and then rebuild? After that, may we see the stack trace >> from a core file that is produced after the segmentation fault? >> >> Thanks, >> >> -- >> Samuel K. Gutierrez >> Los Alamos National Laboratory >> > > Thank you for the suggestion. > > I am a bit reluctant to try this because when it fails, > it *really* fails. > Most of the times the machine doesn't even return the prompt, > and in all cases it freezes and requires a hard reboot. > It is not a segfault that the OS can catch, I guess. > I wonder if enabling debug mode would do much for us, > and get to the point of dumping a core, or just die before that. > > Gus Correa > --------------------------------------------------------------------- > Gustavo Correa > Lamont-Doherty Earth Observatory - Columbia University > Palisades, NY, 10964-8000 - USA > --------------------------------------------------------------------- > >> On May 6, 2010, at 12:01 PM, Gus Correa wrote: >> >>> Hi Eugene >>> >>> Thanks for the detailed answer. >>> >>> ************* >>> >>> 1) Now I can see and use the btl_sm_num_fifos component: >>> >>> I had committed already "btl = ^sm" to the openmpi-mca-params.conf >>> file. This apparently hides the btl_sm_num_fifos from ompi_info. >>> >>> After I switched to no options in openmpi-mca-params.conf, >>> then ompi_info showed the btl_sm_num_fifos component. >>> >>> ompi_info --all | grep btl_sm_num_fifos >>> MCA btl: parameter "btl_sm_num_fifos" (current value: >>> "1", data source: default value) >>> >>> A side comment: >>> This means that the system administrator can >>> hide some Open MPI options from the users, depending on what >>> he puts in the openmpi-mca-params.conf file, right? >>> >>> ************* >>> >>> 2) However, running with "sm" still breaks, unfortunately: >>> >>> Boomer! >>> I get the same errors that I reported in my very >>> first email, if I increase the number of processes to 16, >>> to explore the hyperthreading range. >>> >>> This is using "sm" (i.e. not excluded in the mca config file), >>> and btl_sm_num_fifos (mpiexec command line) >>> >>> The machine hangs, requires a hard reboot, etc, etc, >>> as reported earlier. See the below, please. >>> >>> So, I guess the conclusion is that I can use sm, >>> but I have to remain within the range of physical cores (8), >>> not oversubscribe, not try to explore the HT range. >>> Should I expect it to work also for np>number of physical cores? >>> >>> I wonder if this would still work with np<=8, but with heavier code. >>> (I only used hello_c.c so far.) >>> Not sure I'll be able to test this, the user wants to use the machine. >>> >>> >>> $mpiexec -mca btl_sm_num_fifos 4 -np 4 a.out >>> Hello, world, I am 0 of 4 >>> Hello, world, I am 1 of 4 >>> Hello, world, I am 2 of 4 >>> Hello, world, I am 3 of 4 >>> >>> $ mpiexec -mca btl_sm_num_fifos 8 -np 8 a.out >>> Hello, world, I am 0 of 8 >>> Hello, world, I am 1 of 8 >>> Hello, world, I am 2 of 8 >>> Hello, world, I am 3 of 8 >>> Hello, world, I am 4 of 8 >>> Hello, world, I am 5 of 8 >>> Hello, world, I am 6 of 8 >>> Hello, world, I am 7 of 8 >>> >>> $ mpiexec -mca btl_sm_num_fifos 16 -np 16 a.out >>> -------------------------------------------------------------------------- >>> >>> mpiexec noticed that process rank 8 with PID 3659 on node >>> spinoza.ldeo.columbia.edu exited on signal 11 (Segmentation fault). >>> -------------------------------------------------------------------------- >>> >>> $ >>> >>> Message from syslogd@spinoza at May 6 13:38:13 ... >>> kernel:------------[ cut here ]------------ >>> >>> Message from syslogd@spinoza at May 6 13:38:13 ... >>> kernel:invalid opcode: 0000 [#1] SMP >>> >>> Message from syslogd@spinoza at May 6 13:38:13 ... >>> kernel:last sysfs file: >>> /sys/devices/system/cpu/cpu15/topology/physical_package_id >>> >>> Message from syslogd@spinoza at May 6 13:38:13 ... >>> kernel:Stack: >>> >>> Message from syslogd@spinoza at May 6 13:38:13 ... >>> kernel:Call Trace: >>> >>> Message from syslogd@spinoza at May 6 13:38:13 ... >>> kernel:Code: 48 89 45 a0 4c 89 ff e8 e0 dd 2b 00 41 8b b6 58 03 00 00 >>> 4c 89 e7 ff c6 e8 b5 bc ff ff 41 8b 96 5c 03 00 00 48 98 48 39 d0 73 >>> 04 <0f> 0b eb fe 48 29 d0 48 89 45 a8 66 41 ff 07 49 8b 94 24 00 01 >>> >>> ***************** >>> >>> Many thanks, >>> Gus Correa >>> --------------------------------------------------------------------- >>> Gustavo Correa >>> Lamont-Doherty Earth Observatory - Columbia University >>> Palisades, NY, 10964-8000 - USA >>> --------------------------------------------------------------------- >>> >>> >>> Eugene Loh wrote: >>>> Gus Correa wrote: >>>>> Hi Eugene >>>>> >>>>> Thank you for answering one of my original questions. >>>>> >>>>> However, there seems to be a problem with the syntax. >>>>> Is it really "-mca btl btl_sm_num_fifos=some_number"? >>>> No. Try "--mca btl_sm_num_fifos 4". Or, >>>> % setenv OMPI_MCA_btl_sm_num_fifos 4 >>>> % ompi_info -a | grep btl_sm_num_fifos # check that things were >>>> set correctly >>>> % mpirun -n 4 a.out >>>>> When I grep any component starting with btl_sm I get nothing: >>>>> >>>>> ompi_info --all | grep btl_sm >>>>> (No output) >>>> I'm no guru, but I think the reason has something to do with >>>> dynamically loaded somethings. E.g., >>>> % /home/eugene/ompi/bin/ompi_info --all | grep btl_sm_num_fifos >>>> (no output) >>>> % setenv OPAL_PREFIX /home/eugene/ompi >>>> % set path = ( $OPAL_PREFIX/bin $path ) >>>> % ompi_info --all | grep btl_sm_num_fifos >>>> MCA btl: parameter "btl_sm_num_fifos" (current value: >>>> "1", data source: default value) >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > Return-Path: <users-boun...@open-mpi.org> > X-Original-To: gabr...@cs.uh.edu > Delivered-To: gabr...@cs.uh.edu > Received: from localhost (dijkstra.cs.uh.edu [127.0.0.1]) > by dijkstra.cs.uh.edu (Postfix) with ESMTP id DCA5923CB75 > for <gabr...@cs.uh.edu>; Thu, 6 May 2010 16:34:49 -0500 (CDT) > X-Virus-Scanned: amavisd-new at cs.uh.edu > Received: from dijkstra.cs.uh.edu ([127.0.0.1]) > by localhost (dijkstra.cs.uh.edu [127.0.0.1]) (amavisd-new, port 10024) > with ESMTP id 9aIlmplg06In for <gabr...@cs.uh.edu>; > Thu, 6 May 2010 16:34:47 -0500 (CDT) > Received: from milliways.osl.iu.edu (milliways.osl.iu.edu [129.79.245.239]) > by dijkstra.cs.uh.edu (Postfix) with ESMTP id 674E823CB74 > for <gabr...@cs.uh.edu>; Thu, 6 May 2010 16:34:47 -0500 (CDT) > Received: from milliways.osl.iu.edu (localhost [127.0.0.1]) > by milliways.osl.iu.edu (8.13.1/8.13.1/IUCS_2.92) with ESMTP id > o46LYYhG022846; > Thu, 6 May 2010 17:34:34 -0400 > Received: from rtp-iport-2.cisco.com (rtp-iport-2.cisco.com [64.102.122.149]) > by milliways.osl.iu.edu (8.13.1/8.13.1/IUCS_2.92) with ESMTP id > o46LYSQt022842 > for <us...@open-mpi.org>; Thu, 6 May 2010 17:34:32 -0400 > Authentication-Results: rtp-iport-2.cisco.com; > dkim=neutral (message not signed) header.i=none > X-IronPort-Anti-Spam-Filtered: true > X-IronPort-Anti-Spam-Result: AvsEAKbS4ktAZnwM/2dsb2JhbACeAnGja5lbhRME > X-IronPort-AV: E=Sophos;i="4.52,343,1270425600"; d="scan'208";a="108847076" > Received: from rtp-core-1.cisco.com ([64.102.124.12]) > by rtp-iport-2.cisco.com with ESMTP; 06 May 2010 21:34:28 +0000 > Received: from rtp-jsquyres-8714.cisco.com (rtp-jsquyres-8714.cisco.com > [10.116.19.197]) > by rtp-core-1.cisco.com (8.13.8/8.14.3) with ESMTP id o46LYQO2004203 > for <us...@open-mpi.org>; Thu, 6 May 2010 21:34:28 GMT > Mime-Version: 1.0 (Apple Message framework v1078) > From: Jeff Squyres <jsquy...@cisco.com> > In-Reply-To: <4be303ff.4020...@ldeo.columbia.edu> > Date: Thu, 6 May 2010 17:34:26 -0400 > Message-Id: <2b9fc527-eacd-47da-bc26-45332247e...@cisco.com> > References: <4be08f2a.6000...@ldeo.columbia.edu> > <441acf3b-34a9-4ff5-b78e-b9a8df4e8...@cisco.com> > <4be09531.9040...@ldeo.columbia.edu> > <0d4abfdd-9802-4d77-bf70-c7ec3198f...@open-mpi.org> > <4be0a505.2000...@ldeo.columbia.edu> <4be0cb62.7080...@ldeo.columbia.edu> > <10b2585f-576a-4b18-a83e-e8e165823...@cisco.com> > <4be1ab3a.4010...@ldeo.columbia.edu> > <9a3fcc9c-56de-4de4-a781-460ccc083...@open-mpi.org> > <4be1ec79.3030...@ldeo.columbia.edu> <20100505235456.GA5622@sopalepc> > <7dc1d35d-11c6-4f4b-870a-031ff11f7...@open-mpi.org> > <4be2d427.4090...@oracle.com> <4be2f269.1090...@ldeo.columbia.edu> > <4be2f857.9090...@oracle.com> <4be303ff.4020...@ldeo.columbia.edu> > To: Open MPI Users <us...@open-mpi.org> > X-Mailer: Apple Mail (2.1078) > X-PMX-Version: 5.5.9.388399, Antispam-Engine: 2.7.2.376379, > Antispam-Data: 2010.5.6.211515 > X-PerlMx-Spam: Gauge=IIIIIIII, Probability=8%, Report=' > SUPERLONG_LINE 0.05, BODY_SIZE_4000_4999 0, BODY_SIZE_5000_LESS 0, > BODY_SIZE_7000_LESS 0, __BOUNCE_CHALLENGE_SUBJ 0, > __BOUNCE_NDR_SUBJ_EXEMPT 0, __CP_URI_IN_BODY 0, __CT 0, __CTE 0, > __CT_TEXT_PLAIN 0, __HAS_MSGID 0, __HAS_X_MAILER 0, > __MIME_TEXT_ONLY 0, __MIME_VERSION 0, __MIME_VERSION_APPLEMAIL 0, > __MSGID_APPLEMAIL 0, __SANE_MSGID 0, __TO_MALFORMED_2 0, > __URI_NS , __USER_AGENT_APPLEMAIL 0, __X_MAILER_APPLEMAIL 0' > X-MIME-Autoconverted: from quoted-printable to 8bit by milliways.osl.iu.edu id > o46LYSQt022842 > Subject: Re: [OMPI users] How do I run OpenMPI safely on a > Nehalem standalone machine? > X-BeenThere: us...@open-mpi.org > X-Mailman-Version: 2.1.11rc1 > Precedence: list > Reply-To: Open MPI Users <us...@open-mpi.org> > List-Id: Open MPI Users <users.open-mpi.org> > List-Unsubscribe: <http://www.open-mpi.org/mailman/options.cgi/users>, > <mailto:users-requ...@open-mpi.org?subject=unsubscribe> > List-Archive: <http://www.open-mpi.org/MailArchives/users> > List-Post: <mailto:us...@open-mpi.org> > List-Help: <mailto:users-requ...@open-mpi.org?subject=help> > List-Subscribe: <http://www.open-mpi.org/mailman/listinfo.cgi/users>, > <mailto:users-requ...@open-mpi.org?subject=subscribe> > Content-Type: text/plain; charset="us-ascii" > Content-Transfer-Encoding: 7bit > Sender: users-boun...@open-mpi.org > Errors-To: users-boun...@open-mpi.org > Status: O > X-UID: 88094 > Content-Length: 4035 > X-Keywords: > > > On May 6, 2010, at 2:01 PM, Gus Correa wrote: > >> 1) Now I can see and use the btl_sm_num_fifos component: >> >> I had committed already "btl = ^sm" to the openmpi-mca-params.conf >> file. This apparently hides the btl_sm_num_fifos from ompi_info. >> >> After I switched to no options in openmpi-mca-params.conf, >> then ompi_info showed the btl_sm_num_fifos component. >> >> ompi_info --all | grep btl_sm_num_fifos >> MCA btl: parameter "btl_sm_num_fifos" (current value: "1", >> data source: default value) >> >> A side comment: >> This means that the system administrator can >> hide some Open MPI options from the users, depending on what >> he puts in the openmpi-mca-params.conf file, right? > > Correct. > > BUT: a user can always override the "btl" MCA param and see them again. For > example, you could also have done this: > > echo "btl =" > ~/.openmpi/mca-params.conf > ompi_info --all | grep btl_sm_num_fifos > # ...will show the sm params... > >> 2) However, running with "sm" still breaks, unfortunately: >> >> Boomer! > > Doh! > >> I get the same errors that I reported in my very >> first email, if I increase the number of processes to 16, >> to explore the hyperthreading range. >> >> This is using "sm" (i.e. not excluded in the mca config file), >> and btl_sm_num_fifos (mpiexec command line) >> >> The machine hangs, requires a hard reboot, etc, etc, >> as reported earlier. See the below, please. > > I saw that only some probably-unrelated dmesg messages were emitted. Was > there anything else revealing on the console and/or /var/log/* files? Hard > reboots absolutely should not be caused by Open MPI. > >> So, I guess the conclusion is that I can use sm, >> but I have to remain within the range of physical cores (8), >> not oversubscribe, not try to explore the HT range. >> Should I expect it to work also for np>number of physical cores? > > Your prior explanations of when HT is useful seemed pretty reasonable to me. > Meaning: Nehalem HT will help only in some kinds of codes. Dense computation > codes with few conditional branches may not benefit much from HT. > > But OMPI applications should always run *correctly*, regardless of HT or > not-HT -- even if you're oversubscribing. The performance may suffer > (sometimes dramatically) if you oversubscribe physical cores with dense > computational code, but it should always run *correctly*. > >> I wonder if this would still work with np<=8, but with heavier code. >> (I only used hello_c.c so far.) > > If hello_c is crashing your computer - even if you're running np>8 or np>16 > -- something is wrong outside of Open MPI. I routinely run np=100 hello_c on > machines. > >> $ mpiexec -mca btl_sm_num_fifos 16 -np 16 a.out >> -------------------------------------------------------------------------- >> mpiexec noticed that process rank 8 with PID 3659 on node >> spinoza.ldeo.columbia.edu exited on signal 11 (Segmentation fault). >> -------------------------------------------------------------------------- >> $ >> >> Message from syslogd@spinoza at May 6 13:38:13 ... >> kernel:------------[ cut here ]------------ >> >> Message from syslogd@spinoza at May 6 13:38:13 ... >> kernel:invalid opcode: 0000 [#1] SMP >> >> Message from syslogd@spinoza at May 6 13:38:13 ... >> kernel:last sysfs file: >> /sys/devices/system/cpu/cpu15/topology/physical_package_id >> >> Message from syslogd@spinoza at May 6 13:38:13 ... >> kernel:Stack: >> >> Message from syslogd@spinoza at May 6 13:38:13 ... >> kernel:Call Trace: >> >> Message from syslogd@spinoza at May 6 13:38:13 ... >> kernel:Code: 48 89 45 a0 4c 89 ff e8 e0 dd 2b 00 41 8b b6 58 03 00 00 4c 89 >> e7 ff c6 e8 b5 bc ff ff 41 8b 96 5c 03 00 00 48 98 48 39 d0 73 04 <0f> 0b eb >> fe 48 29 d0 48 89 45 a8 66 41 ff 07 49 8b 94 24 00 01 > > I unfortunately don't know what these messages mean... >
-- Edgar Gabriel Assistant Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
signature.asc
Description: OpenPGP digital signature