On Jan 28, 2011, at 10:04 AM, Andreas Dilger wrote: >> In the absence of controls on the size of the page cache, or enough RAM to >> cache all of the inode and directory blocks in memory, another potential >> solution is to place the metadata on an SSD. One can generate a dm linear >> target table that carves up an ext3/ext4 filesystem such that the inode >> blocks go on one device, and the data blocks go on another. Ideally the >> inode blocks would be placed on an SSD. >> >> I've tried this with both ext3, and with ext4 using flex_bg to reduce the >> size of the dm table. IIRC the overhead is acceptable in both cases - 1us, >> on average. > > I'd be quite interested to see the results of such testing.
I'm waiting for more hardware to show up so I can restart my testing. Hope to have some results to share in another 3-4 weeks. >> Placing the inodes on separate storage is not sufficient, though. Slow >> directory block reads contribute to poor stat performance as well. Adding a >> feature to ext4 to reserve a number of fixed block groups for directory >> blocks, and always allocating them there, would help. Those blocks groups >> could then be placed on an SSD as well. > > I believe there is a heuristic that allocates directory blocks in the first > group of a flex_bg, so if that entire group is on SSD it would potentially > avoid this problem. There is, though I haven't tested it yet. However, you'd need to have a relatively small number of flex_bgs for this to be cost-effective. I heard through the grapevine that you suggest not using "too few" flex_bgs on an ext4 filesystem. Can you elaborate on what might be a reasonable number, and why? Thanks, Jason -- Jason Rappleye System Administrator NASA Advanced Supercomputing Division NASA Ames Research Center Moffett Field, CA 94035 _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
