On Sunday June 24, [EMAIL PROTECTED] wrote:
> Hi,
> 
> We used to (long ago, 2.2.x), whenever we got a write request for some
> buffer,
> search the buffer cache to see if additional buffers which belong to that
> particular stripe are dirty, and then schedule them for writing as well, in
> an
> attempt to write full stripes. That resulted in a huge sequential write
> performance
> improvement.
> 
> If such an approach is still possible today, it is preferrable to delaying
> the writes

 It is not still possible, at least not usefully so. Infact, it is
 also true that it is probably not preferrable. 

 Since about 2.3.7, filesystem data has not, by-and-large, been stored
 in the buffer cache.  It is only stored in the page cache.  So were
 raid5 to go looking in the buffer cache it would be unlikely to find
 anything. 

 But there are other problems.  The cache snooping only works if the
 direct client of raid5 is a filesystem that stores data in the buffer
 cache.  If the filesystem is an indirect client, via LVM for example,
 or even via a RAID0 array, then raid5 would not be able to look in
 the "right" buffer cache, and so would find nothing.  This was the
 case in 2.2.  If you tried an LVM over RAID5 in 2.2, you wouldn't get
 good write speed.  You also would probably get data corruption while
 the array was re-syncing, but that is a separate issue.

 The current solution is much more robust.  It cares nothing about the
 way the raid5 array is used. 

 Also, while the handling of stripes is delayed, I don't believe that
 this would actually show as measurable increase in latency.  The
 effect is really to have requests spend more time on a higher level
 queue, and less time on a lower level queue.  The total time on
 queues should normally be the same or less (due to improved
 throughput) or only very slightly more in pathological cases.

NeilBrown



> for the partial buffer while hoping that the rest of the bufferes in the
> stripe would
> come as well, since it both eliminates the additional delay, and doesn't
> depend on the order in which the bufferes are flushed from the much bigger
> memory buffers to the smaller stripe cache.
> 

I think the ideal solution would be to have the filesystem write data
in two stages, much like Unix apps can.
As soon as a buffer is dirtied (or more accurately, as soon as the
filesystem is happy for the data to be written), it is passed on with a
"WRITE_AHEAD" request.  The driver is free to do what it likes,
including ignore this.
Later, at a time corresponding to "fsync" or "close" maybe, or when
memory is tight, the filesystem can send the buffer down with a
"WRITE" request which says "please write this *now*".

RAID5 could then gather all the write_ahead requests into a hash table
(not unlike the old buffer cache), and easily find full stripes for
writing.

But that is not going to happen in 2.4.

NeilBrown


> Cheers,
> 
> Gadi
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]

Reply via email to