Greetings, We have 4 OSS nodes and 2 MDS nodes configured in HA pairs, running 2.6.18-53.1.14.el5_lustre.1.6.5smp, and using the o2ib network transport. We had multiple failovers recently (possibly due to hardware problems, but no root cause yet) and managed to get things back again to what I _thought_ was a normal state.
However, in the system log we are seeing many "server_bulk_callback" error messages at the rate of ~6 per second. Interestingly, they only come from one HA pair of OSS nodes: Sep 24 23:03:14 lfs-oss-0-3 kernel: LustreError: 20694:0:(events.c:361:server_bulk_callback()) event type 4, status -103, desc ffff81019fce6000 Sep 24 23:03:14 lfs-oss-0-3 kernel: LustreError: 20694:0:(events.c:361:server_bulk_callback()) event type 2, status -103, desc ffff81019fce6000 Sep 24 23:03:16 lfs-oss-0-2 kernel: LustreError: 27257:0:(events.c:361:server_bulk_callback()) event type 4, status -103, desc ffff8101b52b8000 Sep 24 23:03:16 lfs-oss-0-2 kernel: LustreError: 27257:0:(events.c:361:server_bulk_callback()) event type 2, status -103, desc ffff8101b52b8000 Can anyone direct me to documentation to decipher these messages? What does "server_bulk_callback" do, and does "status -103" indicate a severe problem for event types 2 and 4? Thanks very much for your guidance, Nathan _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
