On Jan 28, 2011, at 10:04 AM, Andreas Dilger wrote:

>> In the absence of controls on the size of the page cache, or enough RAM to 
>> cache all of the inode and directory blocks in memory, another potential 
>> solution is to place the metadata on an SSD. One can generate a dm linear 
>> target table that carves up an ext3/ext4 filesystem such that the inode 
>> blocks go on one device, and the data blocks go on another. Ideally the 
>> inode blocks would be placed on an SSD. 
>> 
>> I've tried this with both ext3, and with ext4 using flex_bg to reduce the 
>> size of the dm table. IIRC the overhead is acceptable in both cases - 1us, 
>> on average.
> 
> I'd be quite interested to see the results of such testing.

I'm waiting for more hardware to show up so I can restart my testing. Hope to 
have some results to share in another 3-4 weeks. 

>> Placing the inodes on separate storage is not sufficient, though. Slow 
>> directory block reads contribute to poor stat performance as well. Adding a 
>> feature to ext4 to reserve a number of fixed block groups for directory 
>> blocks, and always allocating them there, would help. Those blocks groups 
>> could then be placed on an SSD as well.
> 
> I believe there is a heuristic that allocates directory blocks in the first 
> group of a flex_bg, so if that entire group is on SSD it would potentially 
> avoid this problem.

There is, though I haven't tested it yet. However, you'd need to have a 
relatively small number of flex_bgs for this to be cost-effective. I heard 
through the grapevine that you suggest not using "too few" flex_bgs on an ext4 
filesystem. Can you elaborate on what might be a reasonable number, and why?

Thanks,

Jason

--
Jason Rappleye
System Administrator
NASA Advanced Supercomputing Division
NASA Ames Research Center
Moffett Field, CA 94035








_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to