Please don't reply to lustre-devel. Instead, comment in Bugzilla by using the 
following link:
https://bugzilla.lustre.org/show_bug.cgi?id=12333



During some large # of OST testing i ran across a bug that causes failures when
you mount a large #'ed ost.  I can re-produce this on a x86_64 vmware session
with the following script:


mkfs.lustre --mdt --mgs --fsname=test1 --device-size=100000K --reformat /tmp/mgs
mount -t lustre -o loop /tmp/mgs /mnt/mds

mkfs.lustre --ost [EMAIL PROTECTED] --fsname=test1 --index=4096
--device-size=1000000 --reformat /tmp/ost
mount -t lustre -o loop /tmp/ost /mnt/ost1

Note the index=4096 portion of the ost format line.  this seems to be about the
limit on my x86_64 box, but it is more like 2048 on my ia64 boxes.

doing this causes errors on the console like this:
Lustre: Server test1-OST1000 on device /dev/loop7 has started
Lustre: 3001:0:(quota_master.c:1105:mds_quota_recovery()) Not all osts are
active, abort quota recovery
LustreError: 3002:0:(llog_obd.c:324:llog_cat_initialize()) kmalloc of 'idarray'
(131104 bytes) failed at
/home/efelix/gits/lustre-1.5.97/lustre/obdclass/llog_obd.c:324
LustreError: 3002:0:(llog_obd.c:324:llog_cat_initialize()) 6006335 total bytes
allocated by Lustre, 302492 by Portals
Lustre: test1-OST1000: received MDS connection from [EMAIL PROTECTED]
LustreError: 3002:0:(lov_log.c:124:lov_llog_origin_connect()) error
osc_llog_connect tgt 4096 (-107)
LustreError: 3002:0:(mds_lov.c:665:__mds_lov_synchronize()) test1-MDT0000:
failed at llog_origin_connect: -107

I guess this array needs to be managed differently for large OST counts.

A simple work around is modifying <kernel>include/linux/kmalloc_sizes.h
and adding a new cache size such as:
CACHE(262144)  
for 8k 
or larger as needed

_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel

Reply via email to