Does anyone have experience running lustre 1.8.9 client with LLNL server 2.5.3 
(zfs)?

I was almost instantly getting LBUG related to IGIF FID assertion after the 
mount:

dsg0515 kernel: LustreError: 30899:0:(mdc_fid.c:334:fid_le_to_cpu()) 
ASSERTION(fid_is_igif(dst) || fid_ver(dst) == 0) failed: 
[0x293e750000006ada:0x70f8b3:0xa721a500]
(full stack dump at the end of email).

This happened only when I tried to mount two lustre file systems (the old 1.8.9 
servers and new 2.5.3 servers) on the same client 1.8.9 during the tests last 
summer. The new 2.5.3 system was freshly formatted and few data written from 
2.5.3 client.
I would like to try llnl 2.5.3 server with 1.8.9 client again.

Apparently I'm missing something obvious.
I realize it is not supported or "tested" configuration, but we successfully 
running the similar configuration with last intel's GA release 2.5.3 server for 
more than half year with HPC clusters doing IO on both lustres and few nodes 
doing 'cp' between old and new lustres, checksumming and stats.

We still need to have double mount (1.8 and 2.5) for another month till we 
finish migration. We will need to run 1.8.9 clients for six months more. I'm 
trying to reassess if I can use 2.5.3 llnl lustre on reinstalled servers in 
this configuration.

lustre 1.8.9 (or 2.5.3) client with LLNL server 2.5.3 only - runs fine.
lustre 1.8.9 client mounting both 1.8 servers and intel's 2.5.3 servers - runs 
fine.
lustre 1.8.9 client mounting both 1.8 servers and llnl 2.5.3 servers - crash 
after mount or few operations.
I was able to make it to last longer by mounting in certain order and doing 
"ls" to few existing files, but it crashes some time later during IO.

The reported FIDs looks real, but also
[0xdead000000100100 :0x200200 :0xdead0000]
[0x5a5a5a5a5a5a5a5a :0x5a5a5a5a :0x5a5a5a5a]

which corresponds to
CONFIG_ILLEGAL_POINTER_VALUE
# define LI_POISON ((int)0x5a5a5a5a)    or like

I tried to compare branches 2.5.3-llnl and whamcloud branch 2_5 tag 2.5.3, and 
also tag 2.5.3.90 .
I did not find commit messages related to IGIF FID in commits which differ, 
though I guess there can be code change not related to commit message in the 
patch I missed.

I would appreciate any hints were to look to make it work and what is the 
difference causing this LBUG.

Thank in advance, Alex.


Jun  1 15:01:10 dsg0515 kernel: LustreError: 
4541:0:(mdc_fid.c:334:fid_le_to_cpu()) ASSERTION(fid_is_igif(dst) || 
fid_ver(dst) == 0) failed: [0x600000005:0x7:0xffffffff]

Jun  1 15:01:10 dsg0515 kernel: LustreError: 
4541:0:(mdc_fid.c:334:fid_le_to_cpu()) LBUG

Jun  1 15:01:10 dsg0515 kernel: Pid: 4541, comm: ls

Jun  1 15:01:10 dsg0515 kernel:

Jun  1 15:01:10 dsg0515 kernel: Call Trace:

Jun  1 15:01:10 dsg0515 kernel:  [<ffffffff810df1c4>] ? 
generic_permission+0x24/0xc0

Jun  1 15:01:10 dsg0515 kernel:  [<ffffffffa0eb3847>] 
libcfs_debug_dumpstack+0x57/0x80 [libcfs]

Jun  1 15:01:10 dsg0515 kernel:  [<ffffffffa0eb3de6>] lbug_with_loc+0x76/0xe0 
[libcfs]

Jun  1 15:01:10 dsg0515 kernel:  [<ffffffffa1135ee5>] fid_le_to_cpu+0xa5/0xb0 
[mdc]

Jun  1 15:01:10 dsg0515 kernel:  [<ffffffffa11a3c45>] ll_readdir+0x935/0xb00 
[lustre]

Jun  1 15:01:10 dsg0515 kernel:  [<ffffffff810d5b47>] ? 
nameidata_to_filp+0x57/0x70

Jun  1 15:01:10 dsg0515 kernel:  [<ffffffff810af1d9>] ? 
__inc_zone_state+0x9/0x70

Jun  1 15:01:10 dsg0515 kernel:  [<ffffffff810a3099>] ? __lru_cache_add+0x9/0x70

Jun  1 15:01:10 dsg0515 kernel:  [<ffffffff810a3119>] ? 
lru_cache_add_lru+0x19/0x40

Jun  1 15:01:10 dsg0515 kernel:  [<ffffffff810e5710>] ? filldir+0x0/0xf0

Jun  1 15:01:10 dsg0515 kernel:  [<ffffffff810e5710>] ? filldir+0x0/0xf0

Jun  1 15:01:10 dsg0515 kernel:  [<ffffffff810e58ac>] vfs_readdir+0xac/0xd0

Jun  1 15:01:10 dsg0515 kernel:  [<ffffffff810e5b66>] sys_getdents+0x86/0xe0

Jun  1 15:01:10 dsg0515 kernel:  [<ffffffff81420def>] ? page_fault+0x1f/0x30

Jun  1 15:01:10 dsg0515 kernel:  [<ffffffff8100b2fb>] 
system_call_fastpath+0x16/0x1b

Jun  1 15:01:10 dsg0515 kernel:

Jun  1 15:01:10 dsg0515 kernel: LustreError: dumping log to 
/tmp/lustre-log.1433188870.4541




_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to