On 2010-08-12, at 15:08, Mark Nelson wrote:
> How does the kernel and storage on the OSSes aggregate writes when the number 
> of service threads are increased?

The OSS layer does not aggregate writes itself.  This is done on the client 
before the writes RPCs are generated, or in the block device (elevator and/or 
cache for h/w RAID devices) at the bottom end.

There is a research project called "Network Request Scheduler" that aims to 
submit the IOs in a more coherent order at the OSS thread level, to facilitate 
block device merging, but it will not explicitly merge the IOs itself.

> The Lustre tuning section on the wiki mentions that there are "internal I/O 
> buffers".  How are aggregating those writes different than the way the dirty 
> cache on the clients work?
> 
> http://wiki.lustre.org/index.php/Lustre_Tuning

In 1.6- there was an explicit 1MB pre-allocated receive buffer for every 
thread, used to stage a single IO RPC from network RDMA and submit to the block 
layer.  In 1.8+ this 1MB of memory is dynamically allocated from the page 
cache, at least for the duration of the IO submission, and then depending on 
/proc tunables (read_cache_enable,  writethrough_cache_enable, 
readcache_max_filesize) it will either discard the page immediately, or keep it 
in memory and let the VM evict it when there is memory pressure (if not 
accessed).

> On 08/12/2010 12:35 PM, Andreas Dilger wrote:
>> On 2010-08-11, at 23:36, burlen wrote:
>>> I am interested in how write()s are buffered in Lustre on the cleint,
>>> server, and network in between. Specifically I'd like to understand what
>>> happens during writes when large number of clients are making large
>>> writes to all of the OSTs on an OSS, and the buffers are inadequate to
>>> handle the outgoing/incoming data.
>> 
>> Lustre doesn't buffer dirty pages on the OSS, only on the client.  The 
>> clients are granted a "reserve" of space in each OST filesystem to ensure 
>> there is enough free space for any cached writes that they do.
>> 
>> 
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Lustre Technical Lead
>> Oracle Corporation Canada Inc.
>> 
>> _______________________________________________
>> Lustre-discuss mailing list
>> [email protected]
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> 
> -- 
> Mark Nelson, Lead Software Developer
> Minnesota Supercomputing Institute
> Phone: (612)626-4479
> Email: [email protected]


Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to