Dear list , When I sent testjobs( dd 5G files, 8jobs /node) to these nodes, I got errors like : Mar 3 11:14:48 bws0091 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50...@tcp. The obd_ping operation failed with -107 Mar 3 11:14:48 bws0091 kernel: LustreError: Skipped 69 previous similar messages Mar 3 11:14:48 bws0091 kernel: LustreError: 167-0: This client was evicted by besfs-OST0010; in progress operations using this service will fail. Mar 3 11:15:51 bws0091 kernel: LustreError: 4959:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from 12345-192.168.50...@tcp, match 4016570 length 1408 too big: 1008 left, 1008 allowed Mar 3 11:27:17 bws0091 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50...@tcp. The obd_ping operation failed with -107 Mar 3 11:27:17 bws0091 kernel: LustreError: Skipped 66 previous similar messages Mar 3 11:27:17 bws0091 kernel: LustreError: 167-0: This client was evicted by besfs-OST0010; in progress operations using this service will fail.
------------------ Lu Wang 2009-03-03 ------------------------------------------------------------- 发件人:Lu Wang 发送日期:2009-03-03 10:14:58 收件人: 抄送:lustre-discuss 主题:Re: [Lustre-discuss] Process accessing Lustre be killed onLustreclient # lctl get_param ldlm.namespaces.*osc*.lru_size ldlm.namespaces.besfs-OST0000-osc-f7dfe400.lru_size=0 ldlm.namespaces.besfs-OST0001-osc-f7dfe400.lru_size=0 ldlm.namespaces.besfs-OST0002-osc-f7dfe400.lru_size=0 ldlm.namespaces.besfs-OST0003-osc-f7dfe400.lru_size=0 ldlm.namespaces.besfs-OST0004-osc-f7dfe400.lru_size=1 ldlm.namespaces.besfs-OST0005-osc-f7dfe400.lru_size=0 ldlm.namespaces.besfs-OST0006-osc-f7dfe400.lru_size=0 ldlm.namespaces.besfs-OST0007-osc-f7dfe400.lru_size=1 ldlm.namespaces.besfs-OST0008-osc-f7dfe400.lru_size=0 ldlm.namespaces.besfs-OST0009-osc-f7dfe400.lru_size=0 ldlm.namespaces.besfs-OST000a-osc-f7dfe400.lru_size=0 ldlm.namespaces.besfs-OST000b-osc-f7dfe400.lru_size=0 ldlm.namespaces.besfs-OST000c-osc-f7dfe400.lru_size=0 .... I got "0" for lru_size, according to the Lustre manual, it means "automatic resizing...". Is the memory pressusre caused by uncontrolled lru size? ---------------- Lu Wang 2009-03-03 ------------------------------------------------------------- 发件人:Johann Lombardi 发送日期:2009-03-02 18:04:13 收件人:Lu Wang 抄送:lustre-discuss 主题:Re: [Lustre-discuss] Process accessing Lustre be killed on Lustreclient On Mon, Mar 02, 2009 at 04:10:46PM +0800, Lu Wang wrote: > My question is: > 1.Does Lustre client requires a lot of low memory? There is one known issue with the lru resize feature on i686 (it can consume almost all the low memory). To know whether or not this is the same problem, could you please try to disable lru resize on the client side and see if you hit this bug again? To do so, you have to run the following commands on the client(s): lctl set_param ldlm.namespaces.*osc*.lru_size=$((NR_CPU*100)) lctl set_param ldlm.namespaces.*mdc*.lru_size=$((NR_CPU*100)) where NR_CPU is the number of cpus on the client. Cheers, Johann _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
