Hello, machine is: # uname -a Linux n00 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux
running slurm 1.3.8 I've installed maui (maui-3.2.6p21-snap.1243977349.tar.gz , Eygene Ryabinkin's correction is included, i checked the sources) Configured: ./configure --prefix=/opt/maui --mandir=/usr/share/man --with-spooldir=/opt/maui --with-machine=n00 --with-key=78 --with-wiki Created file /etc/wiki.conf with line: AuthKey=78 Included in slurm.conf : SchedulerType=sched/wikiSchedulerPort=7321 But finally got the "ALERT: checksum does not match ". In logs below i noticed a couple of facts: 1) The number of bytes varies in reading and MSecGetChecksum function (for some packets) : 06/26 13:34:47 INFO: 3704 of 3704 bytes read from sd 706/26 13:34:47 MSecGetChecksum(Buf,3632,Checksum,DES,CSKey) 2)The string seems to be untimely broken (may be just not to litter the log): 06/26 13:34:47 ALERT: checksum does not match (e3743199c5566b9a:9ab1d151dd49049c) request 'TS=1246008887 AUTH=slurm DT=SC=0 ARG=17#191814:STATE=Running;TASKLIST=:n01;UPDATETIME=1246007985;WCLIMIT=31536000;TASKS=' Precise maui log includes lines: 06/26 13:34:47 ServerProcessRequests()06/26 13:34:47 MLogRoll(NULL,0,1)06/26 13:34:47 INFO: not rolling logs (441447 < 10000000)06/26 13:34:47 MResAdjust(NULL,0,0)06/26 13:34:47 MJobSetAttr(,PAL,Value,1,2)06/26 13:34:47 INFO: job flags for job : 0, req napolicy=SHARED06/26 13:34:47 MJobSetAttr(,GAttr,Value,1,5)06/26 13:34:47 MStatInitializeActiveSysUsage()06/26 13:34:47 MStatClearUsage([NONE],Active)06/26 13:34:47 ServerUpdate()06/26 13:34:47 MSysUpdateTime()06/26 13:34:47 INFO: starting iteration 6006/26 13:34:47 MSchedProcessJobs()06/26 13:34:47 MRMGetInfo()06/26 13:34:47 MClusterClearUsage()06/26 13:34:47 MRMClusterQuery()06/26 13:34:47 MWikiClusterLoadInfo(n00,RCount,EMsg,SC)06/26 13:34:47 MWikiDoCommand(n00,7321,9000000,CHECKSUM,CMD=GETNODES ARG=0:ALL,Data,DataSize,SC)06/26 13:34:47 MSUConnect(S,FALSE,EMsg)06/26 13:34:47 INFO: trying to connect to 10.1.0.1 (Port: 7321)06/26 13:34:47 INFO: non-blocking mode established06/26 13:34:47 MSUSelectWrite(7,90 00000)06/26 13:34:47 INFO: successful connect to TCP server (sd: 7)06/26 13:34:47 MSUSendData(S,9000000,TRUE,FALSE)06/26 13:34:47 MSecGetChecksum2(Buf1,27,Buf2,22,Checksum,DES,CSKey)06/26 13:34:47 INFO: header created '00000069CK=2c5f6971a5844eef TS=1246008887 AUTH=root DT='06/26 13:34:47 INFO: sending short packet '00000069CK=2c5f6971a5844eef TS=1246008887 AUTH=root DT=CMD=GETNODES ARG=0:ALL'06/26 13:34:47 MSUSendPacket(7,Buf,78,9000000,SC)06/26 13:34:47 MSUSelectWrite(7,9000000)06/26 13:34:47 INFO: packet sent (78 bytes of 78)06/26 13:34:47 INFO: command sent to server06/26 13:34:47 INFO: message sent: 'CMD=GETNODES ARG=0:ALL'06/26 13:34:47 MSURecvData(S,9000000,TRUE,SC,EMsg)06/26 13:34:47 MSURecvPacket(7,BufP,9,NULL,9000000,SC)06/26 13:34:47 MSUSelectRead(7,9000000)06/26 13:34:47 INFO: 9 of 9 bytes read from sd 706/26 13:34:47 MSURecvPacket(7,BufP,4435,NULL,9000000,SC)06/26 13:34:47 MSUSelectRead(7,9000000)06/26 13:34:47 INFO: 4435 of 4435 bytes read from sd 706/26 13:34:47 MSecGetChecksum(Buf,4363,Checksum,DES,CSKey)06/26 13:34:47 ALERT: checksum does not match (351c7a893a2e1699:b4584308b241ec39) request 'TS=1246008887 AUTH=slurm DT=SC=0 ARG=64#n01:STATE=Running;ARCH=x86_64;OS=Linux;CMEMORY=10240;CDISK=0;CPROC=8;#n02:STATE='06/26 13:34:47 ERROR: cannot receive data from server n00:732106/26 13:34:47 MSUDisconnect(S)06/26 13:34:47 ALERT: cannot get node list from WIKI RM06/26 13:34:47 ALERT: cannot load cluster resources on RM (RM 'n00' failed in function 'clusterquery')06/26 13:34:47 WARNING: no resources detected06/26 13:34:47 MRMWorkloadQuery()06/26 13:34:47 MWikiWorkloadQuery(n00,JCount,SC)06/26 13:34:47 MWikiDoCommand(n00,7321,9000000,CHECKSUM,CMD=GETJOBS ARG=0:ALL,Data,DataSize,SC)06/26 13:34:47 MSUConnect(S,FALSE,EMsg)06/26 13:34:47 INFO: trying to connect to 10.1.0.1 (Port: 7321)06/26 13:34:47 INFO: non-blocking mode established06/26 13:34:47 MSUSelectWrite(7,9000000)06/26 13:34:4 7 INFO: successful connect to TCP server (sd: 7)06/26 13:34:47 MSUSendData(S,9000000,TRUE,FALSE)06/26 13:34:47 MSecGetChecksum2(Buf1,27,Buf2,21,Checksum,DES,CSKey)06/26 13:34:47 INFO: header created '00000068CK=4e880ad31a667b74 TS=1246008887 AUTH=root DT='06/26 13:34:47 INFO: sending short packet '00000068CK=4e880ad31a667b74 TS=1246008887 AUTH=root DT=CMD=GETJOBS ARG=0:ALL'06/26 13:34:47 MSUSendPacket(7,Buf,77,9000000,SC)06/26 13:34:47 MSUSelectWrite(7,9000000)06/26 13:34:47 INFO: packet sent (77 bytes of 77)06/26 13:34:47 INFO: command sent to server06/26 13:34:47 INFO: message sent: 'CMD=GETJOBS ARG=0:ALL'06/26 13:34:47 MSURecvData(S,9000000,TRUE,SC,EMsg)06/26 13:34:47 MSURecvPacket(7,BufP,9,NULL,9000000,SC)06/26 13:34:47 MSUSelectRead(7,9000000)06/26 13:34:47 INFO: 3704 of 3704 bytes read from sd 706/26 13:34:47 MSecGetChecksum(Buf,3632,Checksum,DES,CSKey)06/26 13:34:47 ALERT: checksum does not match (e3743199c5566b9a:9ab1d151dd49049c) requ est 'TS=1246008887 AUTH=slurm DT=SC=0 ARG=17#191814:STATE=Running;TASKLIST=:n01;UPDATETIME=1246007985;WCLIMIT=31536000;TASKS='06/26 13:34:47 ERROR: cannot receive data from server n00:732106/26 13:34:47 MSUDisconnect(S)06/26 13:34:47 ALERT: cannot get job list from WIKI RM06/26 13:34:47 ALERT: cannot load cluster workload on RM (RM 'n00' failed in function 'workloadquery')06/26 13:34:47 WARNING: no workload detected
_______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
