OK, I see in mcom.h MMAX_BUFFER is set to 65536 and MAX_MBUFFER is set to 65536 in msched_common.h
Our node names are 8 characters long and this job would be requesting 172 nodes specifically so that would be 1376 characters. -- Steven DuChene -----Original Message----- From: Michel Béland [mailto:[email protected]] Sent: Tuesday, November 29, 2011 7:22 AM To: DuChene, StevenX A Cc: [email protected] Subject: Re: [Mauiusers] maui segfaults trying to schedule a job DuChene, StevenX A a écrit : > > This morning I discovered that the maui scheduler process was not > running on one of our clusters like it should. When I try to start the > maui process as the maui user I get a segmentation fault. In checking > the log files the last few entries look like this: > > > > (...) > > There is only this one job in the queue on a 256 node cluster running > torque 2.5.7 and maui 3.2.6p21 > > > > I have tried starting the maui process within strace but I do not see > any smoking gun in that strace output. > > > > I can probably get maui to start if I qdel the job but I was sort of > hoping to see what was causing the problem in case any additional > debugging output was needed. > > I guess that you have more than 16 cores per node so that your job requests more that 4096 cores. In that case, you have to increase MAX_MTASK in include/msched-common.h and recompile. It hast to be equal or greater than the number of cores in the cluster. You have to watch out also for the size of MAX_MBUFFER and MMAX_BUFFER in include/mcom.h and include/msched-common.h. This is used to define the size of the buffer that contains the string exec_host. For large clusters, it is too small and large jobs will kill Maui after they have started execution. It is good to have short node names for that reason. Other parameters to check are MAX_MNODE, MAX_MCLASS, MAX_MREQ_PER_JOB (or MMAX_REQ_PER_JOB), MAX_MRES, MAX_MNODE_PER_JOB, MAX_MNODE_PER_FRAG, MMAX_JOB. -- Michel Béland, analyste en calcul scientifique [email protected] bureau S-250, pavillon Roger-Gaudry (principal), Université de Montréal téléphone : 514 343-6111 poste 3892 télécopieur : 514 343-2155 Calcul Québec (www.calculquebec.ca) Calcul Canada (calculcanada.org) _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
