Ouch...to quote Adam Savage "well there's yer problem". Are you perhaps running a version of GPFS 4.1 older than 4.1.1.9? Looks like there was an LROC related assert fixed in 4.1.1.9 but I can't find details on it.
From: Matt Weil Sent: 12/28/16, 5:21 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LROC yes > Wed Dec 28 16:17:07.507 2016: [X] *** Assert exp(ssd->state != > ssdActive) in line 427 of file > /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C > Wed Dec 28 16:17:07.508 2016: [E] *** Traceback: > Wed Dec 28 16:17:07.509 2016: [E] 2:0x7FF1604F39B5 > logAssertFailed + 0x2D5 at ??:0 > Wed Dec 28 16:17:07.510 2016: [E] 3:0x7FF160CA8947 > fs_config_ssds(fs_config*) + 0x867 at ??:0 > Wed Dec 28 16:17:07.511 2016: [E] 4:0x7FF16009A749 > SFSConfigLROC() + 0x189 at ??:0 > Wed Dec 28 16:17:07.512 2016: [E] 5:0x7FF160E565CB > NsdDiskConfig::readLrocConfig(unsigned int) + 0x2BB at ??:0 > Wed Dec 28 16:17:07.513 2016: [E] 6:0x7FF160E5EF41 > NsdDiskConfig::reReadConfig() + 0x771 at ??:0 > Wed Dec 28 16:17:07.514 2016: [E] 7:0x7FF160024E0E > runTSControl(int, int, char**) + 0x80E at ??:0 > Wed Dec 28 16:17:07.515 2016: [E] 8:0x7FF1604FA6A5 > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, > StripeGroup*, unsigned int*, RpcContext*) + 0x21F5 at ??:0 > Wed Dec 28 16:17:07.516 2016: [E] 9:0x7FF1604FBA36 > HandleCmdMsg(void*) + 0x1216 at ??:0 > Wed Dec 28 16:17:07.517 2016: [E] 10:0x7FF160039172 > Thread::callBody(Thread*) + 0x1E2 at ??:0 > Wed Dec 28 16:17:07.518 2016: [E] 11:0x7FF160027302 > Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 > Wed Dec 28 16:17:07.519 2016: [E] 12:0x7FF15F73FDC5 > start_thread + 0xC5 at ??:0 > Wed Dec 28 16:17:07.520 2016: [E] 13:0x7FF15E84873D __clone + > 0x6D at ??:0 > mmfsd: > /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C:427: > void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, > UInt32, const char*, const char*): Assertion `ssd->state != ssdActive' > failed. > Wed Dec 28 16:17:07.521 2016: [E] Signal 6 at location 0x7FF15E7861D7 > in process 125345, link reg 0xFFFFFFFFFFFFFFFF. > Wed Dec 28 16:17:07.522 2016: [I] rax 0x0000000000000000 rbx > 0x00007FF15FD71000 > Wed Dec 28 16:17:07.523 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx > 0x0000000000000006 > Wed Dec 28 16:17:07.524 2016: [I] rsp 0x00007FEF34FBBF78 rbp > 0x00007FF15E8D03A8 > Wed Dec 28 16:17:07.525 2016: [I] rsi 0x000000000001F713 rdi > 0x000000000001E9A1 > Wed Dec 28 16:17:07.526 2016: [I] r8 0x0000000000000001 r9 > 0xFF092D63646B6860 > Wed Dec 28 16:17:07.527 2016: [I] r10 0x0000000000000008 r11 > 0x0000000000000202 > Wed Dec 28 16:17:07.528 2016: [I] r12 0x00007FF1610C6847 r13 > 0x00007FF161032EC0 > Wed Dec 28 16:17:07.529 2016: [I] r14 0x0000000000000000 r15 > 0x0000000000000000 > Wed Dec 28 16:17:07.530 2016: [I] rip 0x00007FF15E7861D7 eflags > 0x0000000000000202 > Wed Dec 28 16:17:07.531 2016: [I] csgsfs 0x0000000000000033 err > 0x0000000000000000 > Wed Dec 28 16:17:07.532 2016: [I] trapno 0x0000000000000000 oldmsk > 0x0000000010017807 > Wed Dec 28 16:17:07.533 2016: [I] cr2 0x0000000000000000 > Wed Dec 28 16:17:09.022 2016: [D] Traceback: > Wed Dec 28 16:17:09.023 2016: [D] 0:00007FF15E7861D7 raise + 37 at ??:0 > Wed Dec 28 16:17:09.024 2016: [D] 1:00007FF15E7878C8 __GI_abort + 148 > at ??:0 > Wed Dec 28 16:17:09.025 2016: [D] 2:00007FF15E77F146 > __assert_fail_base + 126 at ??:0 > Wed Dec 28 16:17:09.026 2016: [D] 3:00007FF15E77F1F2 > __GI___assert_fail + 42 at ??:0 > Wed Dec 28 16:17:09.027 2016: [D] 4:00007FF1604F39D9 logAssertFailed + > 2F9 at ??:0 > Wed Dec 28 16:17:09.028 2016: [D] 5:00007FF160CA8947 > fs_config_ssds(fs_config*) + 867 at ??:0 > Wed Dec 28 16:17:09.029 2016: [D] 6:00007FF16009A749 SFSConfigLROC() + > 189 at ??:0 > Wed Dec 28 16:17:09.030 2016: [D] 7:00007FF160E565CB > NsdDiskConfig::readLrocConfig(unsigned int) + 2BB at ??:0 > Wed Dec 28 16:17:09.031 2016: [D] 8:00007FF160E5EF41 > NsdDiskConfig::reReadConfig() + 771 at ??:0 > Wed Dec 28 16:17:09.032 2016: [D] 9:00007FF160024E0E runTSControl(int, > int, char**) + 80E at ??:0 > Wed Dec 28 16:17:09.033 2016: [D] 10:00007FF1604FA6A5 > RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int, > StripeGroup*, unsigned int*, RpcContext*) + 21F5 at ??:0 > Wed Dec 28 16:17:09.034 2016: [D] 11:00007FF1604FBA36 > HandleCmdMsg(void*) + 1216 at ??:0 > Wed Dec 28 16:17:09.035 2016: [D] 12:00007FF160039172 > Thread::callBody(Thread*) + 1E2 at ??:0 > Wed Dec 28 16:17:09.036 2016: [D] 13:00007FF160027302 > Thread::callBodyWrapper(Thread*) + A2 at ??:0 > Wed Dec 28 16:17:09.037 2016: [D] 14:00007FF15F73FDC5 start_thread + > C5 at ??:0 > Wed Dec 28 16:17:09.038 2016: [D] 15:00007FF15E84873D __clone + 6D at ??:0 On 12/28/16 4:16 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] wrote: > related note I'm curious how a 3.5 client is able to join a cluster > with a minreleaselevel of 4.1.1.0. I was referring to the fs version not the gpfs client version sorry for that confusion -V 13.23 (3.5.0.7) File system version _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
