On Sunday June 24, [EMAIL PROTECTED] wrote:
> Hi,
>
> We used to (long ago, 2.2.x), whenever we got a write request for some
> buffer,
> search the buffer cache to see if additional buffers which belong to that
> particular stripe are dirty, and then schedule them for writing as well, in
> an
> attempt to write full stripes. That resulted in a huge sequential write
> performance
> improvement.
>
> If such an approach is still possible today, it is preferrable to delaying
> the writes
It is not still possible, at least not usefully so. Infact, it is
also true that it is probably not preferrable.
Since about 2.3.7, filesystem data has not, by-and-large, been stored
in the buffer cache. It is only stored in the page cache. So were
raid5 to go looking in the buffer cache it would be unlikely to find
anything.
But there are other problems. The cache snooping only works if the
direct client of raid5 is a filesystem that stores data in the buffer
cache. If the filesystem is an indirect client, via LVM for example,
or even via a RAID0 array, then raid5 would not be able to look in
the "right" buffer cache, and so would find nothing. This was the
case in 2.2. If you tried an LVM over RAID5 in 2.2, you wouldn't get
good write speed. You also would probably get data corruption while
the array was re-syncing, but that is a separate issue.
The current solution is much more robust. It cares nothing about the
way the raid5 array is used.
Also, while the handling of stripes is delayed, I don't believe that
this would actually show as measurable increase in latency. The
effect is really to have requests spend more time on a higher level
queue, and less time on a lower level queue. The total time on
queues should normally be the same or less (due to improved
throughput) or only very slightly more in pathological cases.
NeilBrown
> for the partial buffer while hoping that the rest of the bufferes in the
> stripe would
> come as well, since it both eliminates the additional delay, and doesn't
> depend on the order in which the bufferes are flushed from the much bigger
> memory buffers to the smaller stripe cache.
>
I think the ideal solution would be to have the filesystem write data
in two stages, much like Unix apps can.
As soon as a buffer is dirtied (or more accurately, as soon as the
filesystem is happy for the data to be written), it is passed on with a
"WRITE_AHEAD" request. The driver is free to do what it likes,
including ignore this.
Later, at a time corresponding to "fsync" or "close" maybe, or when
memory is tight, the filesystem can send the buffer down with a
"WRITE" request which says "please write this *now*".
RAID5 could then gather all the write_ahead requests into a hash table
(not unlike the old buffer cache), and easily find full stripes for
writing.
But that is not going to happen in 2.4.
NeilBrown
> Cheers,
>
> Gadi
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]