Forgot to attach...
Brian Christiansen wrote:
There could be something more than what was pointed out and fixed
earlier. Here is a patch with the changes that were checked in for this,
or similar, issue. Check that your version has this applied. If this
doesn't help try compiling as 32bit as Garrick suggested.
Brian
Wickliffe, Blake W wrote:
Right...but my suspicion is that we are facing something else, since Brian
claims the issue with the 32/64bit was fixed in 3.2.6p21, which we are already
on.
Unless I am misunderstanding you?
Blake Wickliffe
Saudi Aramco
ENOD/CSYS/USG HPC Team
(873-4417)
-----Original Message-----
From: Garrick [mailto:[email protected]]
Sent: Tuesday, August 18, 2009 7:35 AM
To: Wickliffe, Blake W
Cc: Maui Users
Subject: Re: [Mauiusers] Corrupt node feature list
That's fine. 32bit maui build works fine on 64bit host talking to a
64bit pbs_server.
HPCC/Linux Systems Admin
On Aug 17, 2009, at 9:23 PM, "Wickliffe, Blake W" <[email protected]
> wrote:
Unfortunately, we are already using 3.2.6p21, and it is on a 64-bit
system. So, if that's the case, even reverting back to 32-bit might
not work.
Blake Wickliffe
Saudi Aramco
ENOD/CSYS/USG HPC Team
(873-4417)
-----Original Message-----
From: [email protected] [mailto:mauiusers-
[email protected]] On Behalf Of Brian Christiansen
Sent: Monday, August 17, 2009 9:27 PM
To: Maui Users
Subject: Re: [Mauiusers] Corrupt node feature list
There was an issue, previously, where you could only have 32 node
features on a 64bit system without seeing side effects. If you aren't
using the latest snapshot, you could try it and see if it helps.
From the changelog:
Maui 3.2.6p21
- Fixed 64bit issue. Maui assumed ints were always 8 bytes for 64bit
systems even though x86_64 ints are still 4 bytes. This lead to
aliasing
of large indexed node properties to smaller indexed properties. Maui
now
triggers off of sizeof(int). Thanks goes to Alexis Cousein.
Brian Christiansen
Garrick Staples wrote:
Please start new threads with the "new" button in your email
client, not with the "reply" button.
On Mon, Aug 17, 2009 at 04:04:21PM +0300, Wickliffe, Blake W alleged:
Hi all,
Has anyone experienced a problem with Maui corrupting the features
list of nodes after a certain number of nodes are added?
On our cluster, we have 2336 nodes, most of which have only 1
"Feature" or "Property" in the Torque parlance. However,
immediately upon adding another node, we start seeing things like:
Features: [[NONE]][checki][datai]
When doing a "checknode" on various nodes. The problem only gets
worse and more extensive as further nodes are added. Deleting the
nodes from the qmgr brings everything back to normal.
Any ideas?
Yes, I've seen this with 64bit builds. Build maui 32bit and it
won't happen.
---
---------------------------------------------------------------------
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers
The contents of this email, including all related responses, files
and attachments transmitted with it (collectively referred to as
"this Email"), are intended solely for the use of the individual/
entity to whom/which they are addressed, and may contain
confidential and/or legally privileged information. This Email may
not be disclosed or forwarded to anyone else without authorization
from the originator of this Email. If you have received this Email
in error, please notify the sender immediately and delete all copies
from your system. Please note that the views or opinions presented
in this Email are those of the author and may not necessarily
represent those of Saudi Aramco. The recipient should check this
Email and any attachments for the presence of any viruses. Saudi
Aramco accepts no liability for any damage caused by any virus/error
transmitted by this Email.
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers
The contents of this email, including all related responses, files and attachments
transmitted with it (collectively referred to as "this Email"), are intended
solely for the use of the individual/entity to whom/which they are addressed, and may
contain confidential and/or legally privileged information. This Email may not be
disclosed or forwarded to anyone else without authorization from the originator of this
Email. If you have received this Email in error, please notify the sender immediately and
delete all copies from your system. Please note that the views or opinions presented in
this Email are those of the author and may not necessarily represent those of Saudi
Aramco. The recipient should check this Email and any attachments for the presence of any
viruses. Saudi Aramco accepts no liability for any damage caused by any virus/error
transmitted by this Email.
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers
Index: include/moab.h
===================================================================
--- include/moab.h (revision 107)
+++ include/moab.h (revision 108)
@@ -125,6 +125,7 @@
#define M32UINT4 unsigned long
#define M32UINT8 unsigned long long
+/* ints on x86_64 are still 4 bytes */
#ifdef __M64
#define MINTBITS 64
#define MINTLBITS 6
Index: include/moab-proto.h
===================================================================
--- include/moab-proto.h (revision 107)
+++ include/moab-proto.h (revision 108)
@@ -453,6 +453,7 @@
int MSysDestroyObjects(void);
int MSysDiagnose(char *,int,long);
int MSysStartServer(int);
+int M64Init(m64_t *);
Index: CHANGELOG
===================================================================
--- CHANGELOG (revision 107)
+++ CHANGELOG (revision 108)
@@ -1,5 +1,7 @@
Maui 3.2.6p21
- Fixed CHECKSUM authentication for maui + slurm. Thanks goes to Eyegene Ryabinkin.
+ - Fixed 64bit issue. Maui assumed ints were always 8 bytes for 64bit systems even though x86_64 ints are still 4 bytes. This lead to aliasing of large indexed node properties to smaller indexed properties. Maui now triggers off of sizeof(int). Thanks goes to Alexis Cousein.
+ - Fixed an optimiztion issue with x86_64 systems. -O2 was optimizing out parts of the communication strings.
Maui 3.2.6p20
- Fixed a potential security issue when Maui is used with some PBS configurations.
Index: src/server/OUserI.c
===================================================================
--- src/server/OUserI.c (revision 107)
+++ src/server/OUserI.c (revision 108)
@@ -33,6 +33,8 @@
long tmpL;
+ char tmpLine[MMAX_LINE];
+
const char *FName = "UIProcessCommand";
DBG(3,fUI) DPrint("%s(S)\n",
@@ -411,16 +413,16 @@
S->SBufSize = (long)sizeof(SBuffer);
- sprintf(S->SBuffer,"%s%d ",
- MCKeyword[mckStatusCode],
- scFAILURE);
+ sprintf(tmpLine,"%s%d ",
+ MCKeyword[mckStatusCode],
+ scFAILURE);
- Align = (int)strlen(S->SBuffer) + (int)strlen(MCKeyword[mckArgs]);
+ Align = (int)strlen(tmpLine) + (int)strlen(MCKeyword[mckArgs]);
sprintf(S->SBuffer,"%s%*s%s",
- S->SBuffer,
- 16 - (Align % 16),
- " ",
+ tmpLine,
+ 16 - (Align % 16),
+ " ",
MCKeyword[mckArgs]);
HeadSize = (int)strlen(SBuffer);
@@ -429,7 +431,7 @@
if (Function[sindex] != NULL)
scode = (*Function[sindex])(args,S->SBuffer + HeadSize,FLAGS,Auth,&S->SBufSize);
else
- scode = FAILURE;
+ scode = FAILURE;
ptr = S->SBuffer + strlen(MCKeyword[mckStatusCode]);
Index: src/server/mclient.c
===================================================================
--- src/server/mclient.c (revision 107)
+++ src/server/mclient.c (revision 108)
@@ -10,6 +10,7 @@
#define MAX_MCARGS 128
extern mattrlist_t MAList;
+extern m64_t M64;
int MCResCreate(char *);
int MCJobShow(char *);
@@ -563,6 +564,8 @@
DBG(2,fALL) DPrint("%s()\n",
FName);
+ M64Init(&M64);
+
MUBuildPList(MCfg,MParam);
strcpy(C.ServerHost,DEFAULT_MSERVERHOST);
Index: src/mcom/MSU.c
===================================================================
--- src/mcom/MSU.c (revision 107)
+++ src/mcom/MSU.c (revision 108)
@@ -1303,9 +1303,11 @@
if (DoSocketLayerAuth == TRUE)
{
+ char tmpStr[MMAX_BUFFER];
+
time(&Now);
- sprintf(TSLine,"%s%ld %s%s",
+ sprintf(tmpStr,"%s%ld %s%s",
MCKeyword[mckTimeStamp],
(long)Now,
MCKeyword[mckAuth],
@@ -1320,7 +1322,7 @@
}
sprintf(TSLine,"%s %s",
- TSLine,
+ tmpStr,
MCKeyword[mckData]);
MSecGetChecksum2(
Index: src/mcom/MSec.c
===================================================================
--- src/mcom/MSec.c (revision 107)
+++ src/mcom/MSec.c (revision 108)
@@ -130,7 +130,6 @@
-#ifndef __M32COMPAT
int M64Init(
@@ -143,10 +142,10 @@
M->Is64 = FALSE;
- M->INTBC = M32INTBITS;
- M->INTLBC = M32INTLBITS;
- M->MIntSize = M32INTSIZE;
- M->IntShift = M32INTSHIFT;
+ M->INTBITS = M32INTBITS;
+ M->INTLBITS = M32INTLBITS;
+ M->INTSIZE = M32INTSIZE;
+ M->INTSHIFT = M32INTSHIFT;
}
else
{
@@ -154,10 +153,10 @@
M->Is64 = TRUE;
- M->INTBC = M64INTBITS;
- M->INTLBC = M64INTLBITS;
- M->MIntSize = M64INTSIZE;
- M->IntShift = M64INTSHIFT;
+ M->INTBITS = M64INTBITS;
+ M->INTLBITS = M64INTLBITS;
+ M->INTSIZE = M64INTSIZE;
+ M->INTSHIFT = M64INTSHIFT;
}
MDB(5,fSTRUCT) MLog("INFO: 64Bit enabled: %s UINT4[%d] UINT8[%d]\n",
@@ -168,7 +167,6 @@
return(SUCCESS);
} /* END M64Init() */
-#endif /* !__M32COMPAT */
Index: src/moab/MPar.c
===================================================================
--- src/moab/MPar.c (revision 107)
+++ src/moab/MPar.c (revision 108)
@@ -23,6 +23,7 @@
extern mrm_t MRM[];
extern mstat_t MStat;
extern mattrlist_t MAList;
+extern m64_t M64;
extern const char *MQALType[];
extern const char *MResourceType[];
@@ -1252,7 +1253,7 @@
{
P = &MPar[pindex];
- if (!(BM[pindex >> MINTLBITS] & (1 << (pindex % MINTBITS))))
+ if (!(BM[pindex >> M64.INTLBITS] & (1 << (pindex % M64.INTBITS))))
continue;
if (P->Name[0] == '\0')
Index: src/moab/MTrace.c
===================================================================
--- src/moab/MTrace.c (revision 107)
+++ src/moab/MTrace.c (revision 108)
@@ -25,6 +25,7 @@
extern mqos_t MQOS[];
extern mpar_t MPar[];
extern mrm_t MRM[];
+extern m64_t M64;
extern mframe_t MFrame[];
@@ -1219,7 +1220,7 @@
J->SpecFlags |= MSim.TraceDefaultJobFlags;
- for (index = 0;index < MINTBITS;index++)
+ for (index = 0;index < M64.INTBITS;index++)
{
if (!(MSim.TraceIgnoreJobFlags & (1 << index)))
continue;
Index: src/moab/MQOS.c
===================================================================
--- src/moab/MQOS.c (revision 107)
+++ src/moab/MQOS.c (revision 108)
@@ -17,6 +17,7 @@
extern mgcred_t MGroup[];
extern mgcred_t MAcct[];
extern mclass_t MClass[];
+extern m64_t M64;
extern const char *MQOSFlags[];
extern const char *MQALType[];
@@ -896,7 +897,7 @@
for (bindex = 0;bindex < MAX_MQOS;bindex++)
{
- if (!(BM[bindex >> MINTLBITS] & (1 << (bindex % MINTBITS))))
+ if (!(BM[bindex >> M64.INTLBITS] & (1 << (bindex % M64.INTBITS))))
continue;
Q = &MQOS[bindex];
Index: src/moab/MSys.c
===================================================================
--- src/moab/MSys.c (revision 107)
+++ src/moab/MSys.c (revision 108)
@@ -38,6 +38,7 @@
mrmfunc_t MRMFunc[MAX_MRMTYPE];
msim_t MSim;
msys_t MSys; /* cluster layout */
+m64_t M64;
mx_t X;
int MFQ[MAX_MJOB]; /* terminated by '-1' value */
@@ -98,6 +99,8 @@
S->X = (void *)&X;
+ M64Init(&M64);
+
MOSSyslogInit(S);
MUBuildPList((mcfg_t *)MCfg,MParam);
Index: src/moab/MSched.c
===================================================================
--- src/moab/MSched.c (revision 107)
+++ src/moab/MSched.c (revision 108)
@@ -19,6 +19,7 @@
extern mframe_t MFrame[];
extern mckpt_t MCP;
extern mres_t *MRes[];
+extern m64_t M64;
extern int MAQ[];
extern int MUIQ[];
@@ -2256,8 +2257,8 @@
for (sindex = 0;sindex < MaxSet;sindex++)
{
- if (N->FBM[SetIndex[sindex] >> MINTLBITS] &
- (1 << (SetIndex[sindex] % MINTBITS)))
+ if (N->FBM[SetIndex[sindex] >> M64.INTLBITS] &
+ (1 << (SetIndex[sindex] % M64.INTBITS)))
{
SetCount[sindex] += TC;
SetNC[sindex] ++;
@@ -2422,8 +2423,8 @@
{
case mrstFeature:
- if (N->FBM[SetIndex[sindex] >> MINTLBITS] &
- (1 << (SetIndex[sindex] % MINTBITS)))
+ if (N->FBM[SetIndex[sindex] >> M64.INTLBITS] &
+ (1 << (SetIndex[sindex] % M64.INTBITS)))
{
/* node is feasible */
@@ -2576,8 +2577,8 @@
{
case mrstFeature:
- if (N->FBM[SetIndex[BestSet] >> MINTLBITS] &
- (1 << (SetIndex[BestSet] % MINTBITS)))
+ if (N->FBM[SetIndex[BestSet] >> M64.INTLBITS] &
+ (1 << (SetIndex[BestSet] % M64.INTBITS)))
{
/* node is in set */
Index: src/moab/MUtil.c
===================================================================
--- src/moab/MUtil.c (revision 107)
+++ src/moab/MUtil.c (revision 108)
@@ -23,6 +23,7 @@
extern const char *MNodeState[];
extern const char *MHRObj[];
extern const char *MResourceType[];
+extern m64_t M64;
extern mx_t X;
@@ -788,7 +789,7 @@
return(SUCCESS);
}
- if ((AttrValue == NULL) || (MapSize < MINTSIZE))
+ if ((AttrValue == NULL) || (MapSize < M64.INTSIZE))
{
return(FAILURE);
}
@@ -805,7 +806,7 @@
if (!strcmp(MAList[AttrIndex][index],AttrValue))
{
if (AttrMap != NULL)
- AttrMap[index >> MINTLBITS] |= 1 << (index % MINTBITS);
+ AttrMap[index >> M64.INTLBITS] |= 1 << (index % M64.INTBITS);
return(SUCCESS);
}
@@ -822,7 +823,7 @@
MUStrCpy(MAList[AttrIndex][index],AttrValue,sizeof(MAList[0][0]));
- AttrMap[index >> MINTLBITS] |= 1 << (index % MINTBITS);
+ AttrMap[index >> M64.INTLBITS] |= 1 << (index % M64.INTBITS);
DBG(5,fSTRUCT) DPrint("INFO: added MAList[%s][%d]: '%s'\n",
MAttrType[AttrIndex],
@@ -1069,7 +1070,7 @@
Line[0] = '\0';
- for (i = 1;i < MINTBITS;i++)
+ for (i = 1;i < M64.INTBITS;i++)
{
if ((Value & (1 << i)) && (MAList[Attr][i][0] != '\0'))
{
@@ -1097,7 +1098,7 @@
int index;
int findex;
- if ((ValueMap == NULL) || (MapSize < MINTSIZE))
+ if ((ValueMap == NULL) || (MapSize < M64.INTSIZE))
{
strcpy(Line,NONE);
@@ -1106,16 +1107,16 @@
Line[0] = '\0';
- for (findex = 0;findex < (MapSize >> MINTSHIFT);findex++)
+ for (findex = 0;findex < (MapSize >> M64.INTSHIFT);findex++)
{
- for (index = 0;index < MINTBITS;index++)
+ for (index = 0;index < M64.INTBITS;index++)
{
if ((ValueMap[findex] & (1 << index)) &&
(MAList[AttrIndex][index][0] != '\0'))
{
sprintf(Line,"%s[%s]",
Line,
- MAList[AttrIndex][index + findex * MINTBITS]);
+ MAList[AttrIndex][index + findex * M64.INTBITS]);
}
} /* END for (index) */
} /* END for (findex) */
@@ -1152,7 +1153,7 @@
return(NULL);
}
- if ((ValueMap == NULL) || (MapSize < MINTSIZE))
+ if ((ValueMap == NULL) || (MapSize < M64.INTSIZE))
{
return(NULL);
}
@@ -1162,7 +1163,7 @@
for (findex = 0;findex < (MapSize >> 2);findex++)
{
- for (index = 0;index < MINTBITS;index++)
+ for (index = 0;index < M64.INTBITS;index++)
{
if ((ValueMap[findex] & (1 << index)) &&
(MAList[AttrIndex][index][0] != '\0'))
@@ -1217,7 +1218,7 @@
return(Line);
}
- for (i = 1;i < MINTBITS;i++)
+ for (i = 1;i < M64.INTBITS;i++)
{
if ((Value & (1 << i)) && (MAList[Attr][i][0] != '\0'))
{
@@ -4121,7 +4122,7 @@
ptr[0] = '\0';
- for (i = 1;i < MINTBITS;i++)
+ for (i = 1;i < M64.INTBITS;i++)
{
if ((BM & (1 << i)) && (AList[i] != NULL) && (AList[i][0] != '\0'))
{
@@ -4252,7 +4253,7 @@
int mindex;
int len;
- len = MAX(1,(MapSize >> MINTLBITS));
+ len = MAX(1,(MapSize >> M64.INTLBITS));
for (mindex = 0;mindex < len;mindex++)
{
@@ -4275,7 +4276,7 @@
int mindex;
int len;
- len = MAX(1,(MapSize >> MINTLBITS));
+ len = MAX(1,(MapSize >> M64.INTLBITS));
for (mindex = 0;mindex < len;mindex++)
{
@@ -5413,7 +5414,7 @@
char *ptr;
- if ((ValueMap == NULL) || (MapSize < MINTSIZE))
+ if ((ValueMap == NULL) || (MapSize < M64.INTSIZE))
{
strcpy(Line,NONE);
@@ -5422,14 +5423,14 @@
Line[0] = '\0';
- for (findex = 0;findex < (MapSize >> MINTSHIFT);findex++)
+ for (findex = 0;findex < (MapSize >> M64.INTSHIFT);findex++)
{
- for (index = 0;index < MINTBITS;index++)
+ for (index = 0;index < M64.INTBITS;index++)
{
if ((ValueMap[findex] & (1 << index)) &&
(MAList[AttrIndex][index][0] != '\0'))
{
- ptr = MAList[AttrIndex][index + findex * MINTBITS];
+ ptr = MAList[AttrIndex][index + findex * M64.INTBITS];
if (Delim != '\0')
{
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers