Re: linear writes to raid5

Neil Brown Wed, 19 Apr 2006 16:55:10 -0700

On Wednesday April 19, [EMAIL PROTECTED] wrote:
> >>>>> Neil Brown (NB) writes:
> 
>  NB> raid5 shouldn't need to merge small requests into large requests.
>  NB> That is what the 'elevator' or io_scheduler algorithms are for.  There
>  NB> already merge multiple bio's into larger 'requests'.  If they aren't
>  NB> doing that, then something needs to be fixed.
> 
> hmm. then why filesystems try to allocate big chunks and submit them
> at once? what's the point to have bio subsystem?


I've often wondered this....

The rationale for creating large bios has to do with code path length.
Making small requests and sending each one down the block device stack
results in long code paths being called over and over again, each call
doing almost exactly the same thing.  This isn't nice to L-1 cache.

Creating a large request and sending it down once means the long path
is traversed less often.

However I would have built a linked-list of very lightweight
structures and passed that down...

> 
>  NB> It is certainly possible that raid5 is doing something wrong that
>  NB> makes merging harder - maybe sending bios in the wrong order, or
>  NB> sending them with unfortunate timing.  And if that is the case it
>  NB> certainly makes sense to fix it.  
>  NB> But I really don't see that raid5 should be merging requests together
>  NB> - that is for a lower-level to do.
> 
> well, another thing is that it's extremly cheap to merge them in raid5
> because we know request size and what stripes it covers. at same time
>  block layer doesn't know that and need to _search_ where to merge
> to.

For write requests, I don't think there is much gain here.  By the
time you have done all the parity updates, you have probably lost
track of what follows what.

For read requests on a working drive, I'd like to simply bypass the
stripe cache altogether as I outlined in a separate email on
linux-raid a couple of weeks ago.

> 
>  NB> This implies 3millisecs have passed since the queue was plugged, which
>  NB> is a long time.....
>  NB> I guess what could be happening is that the queue is being unplugged
>  NB> every 3msec whether it is really needed or not.
>  NB> i.e. we plug the queue, more requests come, the stripes we plugged the
>  NB> queue for get filled up and processes, but the timer never gets reset.
>  NB> Maybe we need to find a way to call blk_remove_plug when there are no
>  NB> stripes waiting for pre-read...
> 
>  NB> Alternately, stripes on the delayed queue could get a timestamp, and
>  NB> only get removed if they are older than 3msec.  Then we would replug
>  NB> the queue if there were some new stripes left....
> 
> could we somehow mark all stripes that belong to given incoming request
> in make_request() and skip them in raid5_activate_delayed() ? after the
> whole incoming request is processed, drop the mark.

Again, I don't think that the logic should be based on a given
incoming request.  Yes, something needs to be done here, but I think
it should essentially be time based rather than incoming-request
based.

However you are welcome to try things out and see if you can make it
work faster.  If you can, I'm sure your results will be a significant
contribution to whatever ends up being the final solution.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: linear writes to raid5

Reply via email to