Re: Increasing MAXPHYS

Scott Long Mon, 22 Mar 2010 09:28:47 -0700

On Mar 22, 2010, at 9:52 AM, Alexander Sack wrote:
> On Mon, Mar 22, 2010 at 8:39 AM, John Baldwin <j...@freebsd.org> wrote:
>> On Monday 22 March 2010 7:40:18 am Gary Jennejohn wrote:
>>> On Sun, 21 Mar 2010 19:03:56 +0200
>>> Alexander Motin <m...@freebsd.org> wrote:
>>> 
>>>> Scott Long wrote:
>>>>> Are there non-CAM drivers that look at MAXPHYS, or that silently assume
>> that
>>>>> MAXPHYS will never be more than 128k?
>>>> 
>>>> That is a question.
>>>> 
>>> 
>>> I only did a quick&dirty grep looking for MAXPHYS in /sys.
>>> 
>>> Some drivers redefine MAXPHYS to be 512KiB.  Some use their own local
>>> MAXPHYS which is usually 128KiB.
>>> 
>>> Some look at MAXPHYS to figure out other things; the details escape me.
>>> 
>>> There's one driver which actually uses 100*MAXPHYS for something, but I
>>> didn't check the details.
>>> 
>>> Lots of them were non-CAM drivers AFAICT.
>> 
>> The problem is the drivers that _don't_ reference MAXPHYS.  The driver author
>> at the time "knew" that MAXPHYS was 128k, so he did the MAXPHYS-dependent
>> calculation and just put the result in the driver (e.g. only supporting up to
>> 32 segments (32 4k pages == 128k) in a bus dma tag as a magic number to
>> bus_dma_tag_create() w/o documenting that the '32' was derived from 128k or
>> what the actual hardware limit on nsegments is).  These cannot be found by a
>> simple grep, they require manually inspecting each driver.
> 
> 100% awesome comment.  On another kernel, I myself was guilty of this
> crime (I did have a nice comment though above the def).
> 
> This has been a great thread since our application really needs some
> of the optimizations that are being thrown around here.  We have found
> in real live performance testing that we are almost always either
> controller bound (i.e. adding more disks to spread IOPs has little to
> no effect in large array configurations on throughput, we suspect that
> is hitting the RAID controller's firmware limitations) or tps bound,
> i.e. I never thought going from 128k -> 256k per transaction would
> have a dramatic effect on throughput (but I never verified).
> 
> Back to HBAs,  AFAIK, every modern iteration of the most popular HBAs
> can easily do way more than a 128k scatter/gather I/O.  Do you guys
> know of any *modern* (circa within the last 3-4 years) that can not do
> more than 128k at a shot?


>64K broken in MPT at the moment.  The hardware can do it, the driver thinks it 
>can do it, but it fails.  AAC hardware traditionally cannot, but maybe the 
>firmware has been improved in the past few years.  I know that there are other 
>low-performance devices that can't do more than 64 or 128K, but none are 
>coming to mind at the moment.  Still, it shouldn't be a universal assumption 
>that all hardware can do big I/O's.

Another consideration is that some hardware can do big I/O's, but not very 
efficiently.  Not all DMA engines are created equal, and moving to compound 
commands and excessively long S/G lists can be a pessimization.  For example, 
MFI hardware does a hinted prefetch on the segment list, but once you exceed a 
certain limit, that prefetch doesn't work anymore and the firmware has to take 
the slow path to execute the i/o.  I haven't quantified this penalty yet, but 
it's something that should be thought about.

> 
> In other words, I've always thought the limit was kernel imposed and
> not what the memory controller on the card can do (I certainly never
> got the impression talking with some of the IHVs over the years that
> they were designing their hardware for a 128k limit - I sure hope
> not!).

You'd be surprised at the engineering compromises and handicaps that are 
committed at IHVs because of misguided marketters.

Scott

_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Increasing MAXPHYS

Reply via email to