On Mar 22, 2010, at 9:52 AM, Alexander Sack wrote: > On Mon, Mar 22, 2010 at 8:39 AM, John Baldwin <j...@freebsd.org> wrote: >> On Monday 22 March 2010 7:40:18 am Gary Jennejohn wrote: >>> On Sun, 21 Mar 2010 19:03:56 +0200 >>> Alexander Motin <m...@freebsd.org> wrote: >>> >>>> Scott Long wrote: >>>>> Are there non-CAM drivers that look at MAXPHYS, or that silently assume >> that >>>>> MAXPHYS will never be more than 128k? >>>> >>>> That is a question. >>>> >>> >>> I only did a quick&dirty grep looking for MAXPHYS in /sys. >>> >>> Some drivers redefine MAXPHYS to be 512KiB. Some use their own local >>> MAXPHYS which is usually 128KiB. >>> >>> Some look at MAXPHYS to figure out other things; the details escape me. >>> >>> There's one driver which actually uses 100*MAXPHYS for something, but I >>> didn't check the details. >>> >>> Lots of them were non-CAM drivers AFAICT. >> >> The problem is the drivers that _don't_ reference MAXPHYS. The driver author >> at the time "knew" that MAXPHYS was 128k, so he did the MAXPHYS-dependent >> calculation and just put the result in the driver (e.g. only supporting up to >> 32 segments (32 4k pages == 128k) in a bus dma tag as a magic number to >> bus_dma_tag_create() w/o documenting that the '32' was derived from 128k or >> what the actual hardware limit on nsegments is). These cannot be found by a >> simple grep, they require manually inspecting each driver. > > 100% awesome comment. On another kernel, I myself was guilty of this > crime (I did have a nice comment though above the def). > > This has been a great thread since our application really needs some > of the optimizations that are being thrown around here. We have found > in real live performance testing that we are almost always either > controller bound (i.e. adding more disks to spread IOPs has little to > no effect in large array configurations on throughput, we suspect that > is hitting the RAID controller's firmware limitations) or tps bound, > i.e. I never thought going from 128k -> 256k per transaction would > have a dramatic effect on throughput (but I never verified). > > Back to HBAs, AFAIK, every modern iteration of the most popular HBAs > can easily do way more than a 128k scatter/gather I/O. Do you guys > know of any *modern* (circa within the last 3-4 years) that can not do > more than 128k at a shot?
>64K broken in MPT at the moment. The hardware can do it, the driver thinks it >can do it, but it fails. AAC hardware traditionally cannot, but maybe the >firmware has been improved in the past few years. I know that there are other >low-performance devices that can't do more than 64 or 128K, but none are >coming to mind at the moment. Still, it shouldn't be a universal assumption >that all hardware can do big I/O's. Another consideration is that some hardware can do big I/O's, but not very efficiently. Not all DMA engines are created equal, and moving to compound commands and excessively long S/G lists can be a pessimization. For example, MFI hardware does a hinted prefetch on the segment list, but once you exceed a certain limit, that prefetch doesn't work anymore and the firmware has to take the slow path to execute the i/o. I haven't quantified this penalty yet, but it's something that should be thought about. > > In other words, I've always thought the limit was kernel imposed and > not what the memory controller on the card can do (I certainly never > got the impression talking with some of the IHVs over the years that > they were designing their hardware for a 128k limit - I sure hope > not!). You'd be surprised at the engineering compromises and handicaps that are committed at IHVs because of misguided marketters. Scott _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"