Please don't reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=12333
During some large # of OST testing i ran across a bug that causes failures when you mount a large #'ed ost. I can re-produce this on a x86_64 vmware session with the following script: mkfs.lustre --mdt --mgs --fsname=test1 --device-size=100000K --reformat /tmp/mgs mount -t lustre -o loop /tmp/mgs /mnt/mds mkfs.lustre --ost [EMAIL PROTECTED] --fsname=test1 --index=4096 --device-size=1000000 --reformat /tmp/ost mount -t lustre -o loop /tmp/ost /mnt/ost1 Note the index=4096 portion of the ost format line. this seems to be about the limit on my x86_64 box, but it is more like 2048 on my ia64 boxes. doing this causes errors on the console like this: Lustre: Server test1-OST1000 on device /dev/loop7 has started Lustre: 3001:0:(quota_master.c:1105:mds_quota_recovery()) Not all osts are active, abort quota recovery LustreError: 3002:0:(llog_obd.c:324:llog_cat_initialize()) kmalloc of 'idarray' (131104 bytes) failed at /home/efelix/gits/lustre-1.5.97/lustre/obdclass/llog_obd.c:324 LustreError: 3002:0:(llog_obd.c:324:llog_cat_initialize()) 6006335 total bytes allocated by Lustre, 302492 by Portals Lustre: test1-OST1000: received MDS connection from [EMAIL PROTECTED] LustreError: 3002:0:(lov_log.c:124:lov_llog_origin_connect()) error osc_llog_connect tgt 4096 (-107) LustreError: 3002:0:(mds_lov.c:665:__mds_lov_synchronize()) test1-MDT0000: failed at llog_origin_connect: -107 I guess this array needs to be managed differently for large OST counts. A simple work around is modifying <kernel>include/linux/kmalloc_sizes.h and adding a new cache size such as: CACHE(262144) for 8k or larger as needed _______________________________________________ Lustre-devel mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-devel
