Re: [networking-discuss] NIC is only functional when 'snoop' is running

2010-07-26 Thread Paul Durrant
Are you sure it's not hardware? When snoop is run it will tell your
NIC driver to enable 'promiscuous' mode, i.e. stop filtering received
packets on MAC address. So is your h/w definitely programmed with the
correct MAC address?

  Paul

On 24 July 2010 02:38, Andrew Chace andrew.ch...@gmail.com wrote:
 It's definitely not a hardware problem: I disabled nwam, and deleted all 
 VNICs and the problem went away. I'm still trying to figure out how and why 
 VNICs are preventing the NIC from working correctly.
 --
 This message posted from opensolaris.org
 ___
 networking-discuss mailing list
 networking-discuss@opensolaris.org




-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] is it a problem if the cpu bound with a network card is 100% used?

2010-02-03 Thread Paul Durrant
On 3 February 2010 09:40, Daniel, Wu dtrace...@gmail.com wrote:
 network card interrupt is bound to cpu 7. If cpu 7 is 100% used, but other 
 cpu are less than 30% used, will the network latency increase?

 I am not sure after bind the network interrupt to one cpu, whether other cpu 
 could help to server the network traffic in interrupt mode and polling mode.

Depends largely on how the driver is written and what latency you're
measuring. Are you looking at application - application round trip
times? If so, then clearly your app. needs to be scheduled to process
the data and if the core it happens to be running on is constantly
pre-empted by interrupts then latency will naturally increase.

  Paul

-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Fwd: [network-sip-discuss] buffers not returing from stack (gldv3 network driver)

2009-09-16 Thread Paul Durrant

pradeep gopana wrote:


Hi ,
Iam developing a gldv3 network driver , i often face problems
while detaching my module . When heavy traffic is running on that
interface and if i try to remove the module i end up some buffers not
being returned from the stack . I am failing detach during this
buffer held
condition . But is there any way to push the stack to return the
completions



In a word, no.

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] problem with GLDv3 when m_instance != 0

2009-09-10 Thread Paul Durrant

Garrett D'Amore wrote:
Pretty sure I've nailed it.  I need to do some more debugging, but 
softmac makes a horrible assumption that ppa == ddi_get_instance(dip) 
right at the start of softmac_create().  I'll probably change this to 
use the minor number from the dev_t field instead.




I recall that ppa == ddi_get_instance(dip) is a hard requirement from 
somewhere. I think I filed a PSARC case for it years ago. I go look...


  Paul

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] problem with GLDv3 when m_instance != 0

2009-09-10 Thread Paul Durrant

Paul Durrant wrote:

Garrett D'Amore wrote:
Pretty sure I've nailed it.  I need to do some more debugging, but 
softmac makes a horrible assumption that ppa == ddi_get_instance(dip) 
right at the start of softmac_create().  I'll probably change this to 
use the minor number from the dev_t field instead.




I recall that ppa == ddi_get_instance(dip) is a hard requirement from 
somewhere. I think I filed a PSARC case for it years ago. I go look...




I think it's this one... It's not open though so I can't look in the 
case file :-(


PSARC/2003/375: NIC Driver Name Constraints II

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] dladm and private link properties

2009-07-06 Thread Paul Durrant

Garrett D'Amore wrote:


That decade of experience has led to network administration with Solaris
reaching the breaking point (pre-dladm).  I don't want to go back there.
  


Unquestionably, ndd was a mess. But just yanking the rug out from NIC 
driver developers, customers, and PAE doesn't sound like a good solution 
either.




Absolutely. If you leave a vacuum, something will fill it.

  Paul

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] dladm and private link properties

2009-07-02 Thread Paul Durrant

Peter Memishian wrote:


I'm all for fixing that.  (I'd also be interesed to hear the reasons why
an administrator would need to e.g. disable LRO that are not workarounds
for other deficiencies in our stack.)



Disabling LRO is probably only necessary if you believe there is a 
problem with the stack, or you want to comparatively benchmark. There's 
also a couple of policy tweaks though: whether you honour PSH 
boundaries, and whether you go all the way up to 64k packets or not. 
Both these can have positive or negative effects on some benchmarks 
(usually when you have a windows box on the other end doing LRO too).


Receive side scaling tweaks may need to be made, though, to try to limit 
the interrupt load to a subset of CPUs to avoid hitting apps. too much. 
There's also interrupt moderation to consider with this too; all h/w 
tends to do this in subtly different ways.


All-in-all I think it's an excellent idea to remove as many tunables as 
possible, but when new h/w comes along it's generally necessary to add 
them because existing APIs/features within the stack don't quire fit the 
model. Those can be modified and the tunables eventually removed, but 
this is an iterative process and thus, I believe, Solaris should have 
good support for dynamic driver tunables rather than trying to deny 
their existence. All you're going to achieve by not supporting them well 
in dladm is to force IHVs to write their own tools to play with a 
private IOCTL interface... which is much much worse.


  Paul

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] dladm and private link properties

2009-07-01 Thread Paul Durrant

Peter Memishian wrote:

If they
are not that generic, I'd encourage driver writers to explore other
options like auto-configuration.  I know this isn't always possible, but
the answer is *not* to build a wet bar in the private property lifeboat.



Oh, they generally are generic enough in my experience... E.g. LRO, 
receive side scaling, etc. The problem is that Solaris has no APIs to 
support them and the entry bar for a driver developer getting back a new 
API into OpenSolaris is just too high. It's difficult enough to even get 
a driver in... I tried for 6 months and failed.


  Paul

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Run GLDv3 driver on Solaris 10

2009-06-25 Thread Paul Durrant

Tom Chen wrote:


I am wondering whether GLDv3 driver can run on Solaris10? We upgraded
ON bits on OpenSolaris to build GLDv3 driver, is the driver workable
on Solaris10? must we use GLDv2 interface? what's the limitation for
GLDv3 driver?  MC_SETPROP  MC_GETPROP supported? ie. using dladm
setprop/getprop to set/get driver property.



What do you mean by Solaris 10? Update?

GLDv3 is (still) unpublished. This means that you have no guarantees 
that a driver built using current ON headers (remember that the GLDv3 
headers are not in /usr/include) will be compatible with even the most 
recent Solaris 10 update. In fact, I'm pretty sure it won't be,

Your only safe choice with GLDv3 is build the whole of ON and bfu your box.

  Paul

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] profile the driver

2009-06-24 Thread Paul Durrant

Pradeep wrote:

I am developing a gldv3 network driver , i need
some information about profiling the driver. Is there any tool
or method available in solaris which can profile the driver like
oprofile in linux . All i want to know which function is using 
how much percentage of cpu and all




driver-discuss is probably more appropriate for this discussion, but you 
should start by looking at lockstat (-I and -i options) and the dtrace 
cpc provider.


  Paul

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] stale networking-related PSARC cases

2009-06-18 Thread Paul Durrant

Andrew Gallatin wrote:


2003/146  ClearWater: a faster socket layer implem  David 
Edmondson   unset   inception held 05/14/03   Ralph 
Campbell/Carol


What happened to this one?  I read the psarc, and it sounds interesting.
Was it overtaken by events, or does the problem it tries to solve
still exist?



It was indeed overtaken by events: the lay-off the UK kernel engineering 
team in 2005.


  Paul

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Project Proposal.

2009-06-15 Thread Paul Durrant

Brian Utterback wrote:
Due to the recent changes to the Project process, I find that the NTP 
project is now a project without portfolio. Although the NTP project 
recently delivered the upgrade of NTP to version 4, there were some 
features that were deferred (sntp, ntpsnmpd)  in addition to new 
features we would like to develop (e.g. making NTP Solaris priv aware).  
As such, I would like to get the endorsement of the networking community 
so that the NTP project doesn't go poof!



+1 from me

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] [driver-discuss] mac_alloc() failing

2009-06-09 Thread Paul Durrant

Somnath kotur wrote:


I even saw the src code for mac_alloc and apart  from checking for
the above MAC_VERSION it just does a kmem_alloc() and returns the
memory, I attempted this on 2 different systems but with same OS
installed and it failed on both.Wondering if it's anything to do with
my installation?



I think that's exactly the problem. How are you compiling? I assume you 
must be including headers from a nevada ON source tree since I don't 
think the MAC module headers are exported into /usr/include/sys. Thus, 
you're probably building a driver with a MAC_VERSION that differs from 
the MAC module on your target installation.
You should be testing on a box installed with the latest dev. update of 
OpenSolaris or installed with SXCE.


  Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Buffer management (was GLD3 NIC driver performance tuning)

2009-06-02 Thread Paul Durrant

Min Miles Xu wrote:


I've been working on the buffer management on the rx side, consolidating 
the dma rx buffer pool of all the GLD instances (port/ring). The driver 
can simply ask for a number of buffer from the framework and use it, 
pass it up to the stack. It's the framework's responsibility to recycle 
the buffer returned. So everything is transparent to the drivers. 
Another prominent advantage of doing so is that the buffer can be shared 
among instances. New Intel 10G NICs have 128 rings. The existing way of 
allocating buffer for each ring is a big waste of memory.
I already have a prototype for e1000g and ixgbe. But I need some more 
time to conduct experiments and refine it. Then I will handle it out for 
reviews. The code to be integrated may be applied to ixgbe only, then 
applies to other NIC drivers.




How do you keep the buffers DMA mapped between uses? Is the driver still 
responsible for DMA mapping?


  Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Buffer management (was GLD3 NIC driver performance tuning)

2009-06-02 Thread Paul Durrant

Andrew Gallatin wrote:


Isn't the overhead to map/unmap the buffer fairly high?  Why don't
you keep the buffer mapped?



That was what I was getting at. If the pool is shared amongst drivers 
then the buffer has to be unmapped before being recycled (in case it's 
allocated by a different driver) thus blowing away most of the advantage 
of loanup.


  Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Architectural requirements behind lib*adm...

2009-04-08 Thread Paul Durrant

Darren Reed wrote:

On  7/04/09 02:05 AM, Paul Durrant wrote:

...
The majority of code, rather than being put in usr/src/cmd/dladm.c was 
put in usr/src/lib/libdladm.c to create an API so that other tools 
could use it if so desired.


Was there any input from anyone outside of the team that created dladm 
as to how those functions should be named and the interfaces designed?




Nope. The initial functionality of libdladm was minimal. We had quite a 
bit of input as to how the various subcommands of dladm should be named 
and I tended to follow that guidance in naming the functions in libdladm 
i.e. there was basically a one-to-one mapping between dladm subcommands 
and libdladm functions.


 Was there any thought as to what those other tools might be or how it
 would be expected for them to use that library?


My only thoughts at the time was that there might be a GUI one day.

  Paul

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Architectural requirements behind lib*adm...

2009-04-07 Thread Paul Durrant

Darren Reed wrote:

Are there any requirements, in the form of documents or emails from
whoever that spell out what the requirements are for the various
lib*adm libraries that we've delivered and are continuing to deliver?



libdladm was an original component for Nemo.

It was required because Nemo, in it's original form, did not do any 
in-kernel auto-creation of DLPI interfaces. dladm was introduced as the 
layer 2 equivalent of ifconfig (the dlconfig name was objected to at the 
time, but I can't recall why) which was to be called at network/physical 
SMF time to create the DLPI interfaces for ifconfig to plumb.
dladm could then be used subsequently to create VLAN interfaces (which 
were essentially no different to 'normal' interfaces, other then they 
had a non-zero tag value associated with them which caused the DLS code 
to insert/strip VLAN headers).
The majority of code, rather than being put in usr/src/cmd/dladm.c was 
put in usr/src/lib/libdladm.c to create an API so that other tools could 
use it if so desired.


  Paul

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] [osol-discuss] SXCE 109 kernel panic

2009-03-11 Thread Paul Durrant

Gino wrote:


# mdb -k unix.0 vmcore.0
Loading modules: [ unix genunix specfs dtrace mac cpu.generic 
cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci zfs sd mpt fcp fctl qlc sockfs 
ip hook neti sctp arp usba stmf idm md cpc random crypto smbsrv nfs nca fcip 
lofs logindmux ptm nsctl sdbc ufs sv ii sppp nsmb rdc ]

$c

ip_tcp_input+0x6a(0, 0, ff04dc34d068, 0, ff052e7db408, 0)
ip_accept_tcp+0x7cf(ff04dc34d068, ff04e9519088, ff04e2cddc40, 
ff04e80a3bc0, ff001fb94be8, ff001fb94be4)
squeue_polling_thread+0x13f(ff04e2cddc40)
thread_start+8()



It looks like ip_tcp_input() has been called with a NULL mblk and ipha. 
cc-ing network discuss to see if this looks familiar to anyone. As a 
matter of interest, what network driver are you using?


  Paul

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] [brussels-dev] per-interface tunables in ipadm/ndd

2009-03-10 Thread Paul Durrant

sowmini.varad...@sun.com wrote:


None of these options is ideal, but #2 seems to make the best of
a bad-deal. Thoughts?



How about deliberately making sure that the per-interface props. are 
differently names to global props. already available via ndd and giving 
per-interface props. the possibility of being undefined. Then:


- if the per-interface prop. is undefined the global prop. value is used 
for that interface
- ndd only reports global values and per-interface prop. values have to 
be accessed via ipadm


  Paul

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] [brussels-dev] per-interface tunables in ipadm/ndd

2009-03-10 Thread Paul Durrant

Girish Moodalbail wrote:
Well what if the end user sets the property using ipadm on all 
interfaces, as in:


ipadm set-prop -m ip ip_def_ttl=128

Note: Currently for 'ipadm' you can either specify interface name or you 
don't. If you specify the interface name it will be for that property. 
But if you do specify then it's for all the interfaces.


In that case we should allow it to be effected on all the interfaces, 
right? Shouldn't that hold good with 'ndd' too?




I guess that would be a 'nice to have' but I wouldn't see it as a 
requirement that an old tool reflects configuration set by a new one.


  Paul

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] [driver-discuss] A ques tion: can be avoid using ´bcopy´ in Tx of the NIC dr iver?

2009-03-04 Thread Paul Durrant

Brian Xu - Sun Microsystems - Beijing China wrote:
Are there any existing docs on this project. I am interested in them and 
I´d like to have a look.




I think the point is that the isn't a project, but there probably should 
be. New mechanisms for optimizing DMA are up for grabs to anyone who 
cares to pay it attention, as they have been for a long time (as PAE 
will tell you).


  Paul

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Skipped STREAMS module?

2009-03-04 Thread Paul Durrant

Erik Nordmark wrote:


If your module NAKs all DL_IOC_HDR_INFO ioctls then you will only see 
DL_UNITDATA_REQ M_PROTO messages and no M_DATA.




... and, of course, you'll kill performance. Actually just putting the 
module there will probably kill performance because the presence of a 
module between IP and layer 2 will prevent much of the oprimization that 
Nemo introduced and Crossbow added to.


  Paul

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] [driver-discuss] A ques tion: can be avoid using ´bcopy´ in Tx of the NIC dr iver?

2009-03-03 Thread Paul Durrant

Mark Johnson wrote:


This is a good area to investigate. I don't believe a new
DDI interface is the way to approach it though (e.g.
ddi_dma_mblk_bind_handle()). It should be an option to be
done for you in gld. i.e. gld gives you a list of cookies
that fit within your dma constraints (dma_attr).



Why not put it in the DDI? There's nothing GLD specific about DMA.



There's more to it than just caching a PFN though. The real
solution is to bring the rx and tx code path optimizations
around buffer management into a common piece of code (I know
it sounds blue sky :-), but it's what's needed long term).
It's more important to bring all the NICs up to a consistent
level. I would expect different code paths for both different
platforms and different NIC properties.



The diversity of NIC h/s is the reason why the goal of common codepath 
optimization is probably not realistic. I've come across many NIC 
drivers and the schemes for driving different chips is usually too 
diverse; it would be possible to make code common but it would just be 
equally bad on all chips.


  Paul

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] [driver-discuss] A ques tion: can be avoid using ´bcopy´ in Tx of the NIC dr iver?

2009-03-03 Thread Paul Durrant

Andrew Gallatin wrote:

Garrett D'Amore wrote:

Without bcopy, you might have to allocate more IOMMU entries.  Its a 
bigger problem on the rx path when you do loanup and buffer recycling 
(using esballoc), but even on the tx side, if you have a packet that is 


Drivers only resort to loanup *BECAUSE THE SOLARIS MECHANISM TO GET
A DMA ADDRESS SUCKS SO BADLY*.  I apologize for shouting, but
I simply cannot emphasize this enough! I'd love to make my
Solaris driver like my drivers for *EVERY OTHER *NIX I SUPPORT*
and eliminate loanup, and allocate rx buffers on the fly.  Loanup
sucks.  But I cannot, because the DDI framework is so expensive.


Sun knew this more than a decade ago, which is why pretty much all sparc 
network drivers used dvma_kaddr_load() instead. There's never been any 
will to do anything about it though; that fact that 
ddi_dma_addr_bind_handle() had to *allocate* TTEs as well as fill them 
in always struck me as wrong.


  Paul

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] [driver-discuss] A ques tion: can be avoid using ´bcopy´ in Tx of the NIC dr iver?

2009-03-03 Thread Paul Durrant

Mark Johnson wrote:



Why not put it in the DDI? There's nothing GLD specific about DMA.


Who else would use it besides NICs?



Who knows? I just don't see any point in trying to be restrictive 
though. STREAMS blocks and associated functions are part of the DDI so 
why not provide DMA mapping functions specific to them in the DDI?




The diversity of NIC h/s is the reason why the goal of common codepath 
optimization is probably not realistic. I've come across many NIC 
drivers and the schemes for driving different chips is usually too 
diverse; it would be possible to make code common but it would just be 
equally bad on all chips.


I certainly can see difficulties on the rx side. It would
be interesting to hear about a case which was a one off one
though.. I would imagine you would be able to bin most of
them.



My driver has relatively complex receive path because I do LRO in s/w. 
Thus I have to maintain flow tables and coalescing state; I agree that 
this code looks like a candidate for making common but my h/w generates 
a hash which can be used to optimize flow lookup; can that be made 
common? Also, code placement causing i-cache misses turned out to be 
significant and I even went to the trouble of using some pretty 
cumbersom macros to try to tune out function call overhead; can that be 
made common?



I'd be interested in examples on the TX side where this
is the case.



Commonality on the TX side could be defeated by fragmentation rules. 
Soft LSO would seem like a a good thing to make common but can you do it 
in such a way that can handle h/w with a restriction that fragments not 
aligned on a 16 byte boundary cannot exceed 512 bytes in length? (Yes, 
the h/w is broken but show me a piece of h/w that is not). A driver 
writer's job is to cover up mistakes in h/w design :-)


  Paul

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] [driver-discuss] A qu estion: can be avoid using ´bcopy´ in Tx of the NIC driver?

2009-03-03 Thread Paul Durrant

Andrew Gallatin wrote:


This means we invent a new mblk data allocator which allocates buffers
which are optimized for network drivers.  On IOMMU-less systems,
these buffers have their physical address associated with them
(somehow).  On IOMMU-full systems, these buffers are a static pool
which is pre-allocated at boot time, and pre-mapped in the IOMMU.
Rather than their physical addresses, they have their IOMMU address
associated with them.  Drivers continue to use the existing DDI DMA
interface, but it just gets much, much faster because 90% of the code
path goes away for data blocks allocated by this fastpath.  There are
a few cases where this can break down (multiple IOMMUs per system,
IOMMU exhaustion), and then things would only devolve to what we have
now.



I suggested much the same as this scheme years ago, and even have a 
PSARC spec. for it although I think my plan was to defer mapping through 
the IOMMU to a first-use case... It was a scheme where each block of 
DMAable memory had an associated cookie which could be passed around 
with it. The cookie could then be used to cache mapping info. and 
cookies could also be 'remapped' which would optimize to essentially a 
dvma_kaddr_load() on systems with IOMMUs. The current DDI interface as 
essentially non-optimal on all platforms as opposed to being optimal on 
some.


  Paul

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] [driver-discuss] A ques tion: can be avoid using ´bcopy´ in Tx of the NIC dr iver?

2009-03-03 Thread Paul Durrant

Mark Johnson wrote:


I agree about the HW broken comment :-) But I'm not sure how
you are restricted by getting a scatter/gather list passed
down to the driver too.. i.e. you have the VA, size, and SGL.
If you need to carve up the SGL, then go for it.



It's a question of information 'impendence mismatch' as I once heard 
someone put it. You need to be sure that whatever routine is doing any 
sort of memory carve-up/placement has all the info. it needs to do a 
good job. I'm just not convinced that you'd ever be able to abstract 
stuff away from the h/w driver without losing vital info. in the process.
I think we did a reasonable job in Nemo of abstracting away what we 
could, but one has to be very careful not to go too far and kill 
performance. I don't know how much testing is done in Sun's kernel 
networking group using 10G NICs; if it's anything like it was when I was 
there though the answer will be close to zero!


  Paul

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] [driver-discuss] code review request - VIA Rhine Fast Ethernet driver

2009-02-20 Thread Paul Durrant

Joost Mulders wrote:


No, you need to guard against race conditions.  That implies locking.

This area is really tricky to get right, and almost every driver that 
has ever tried to do this has gotten it WRONG.


The extra overhead this implies is painful.  I'd *really* recommend 
you consider just using bcopy.


What driver does it right then? A right example would be useful.



It looks like there are quite a few network drivers in ON that use 
desballoc(). One assumes that, since they are integrated, they get it 
right! nxge is the example I picked to look at.


FWIW I'm trying to integrate a 10G ethernet driver at the moment. I use 
a derivation of desballoc() called xesballoc() (that I created myself). 
You can see my code at 
http://cr.opensolaris.org/~pdurrant/webrev/usr/src/uts/common/io/sfc/sfxge/sfxge_rx.c.html


  Paul

--
===
Paul Durrant
http://www.linkedin.com/in/pdurrant
===
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] webrev: M_READ fix for nonblocking I/O

2009-01-20 Thread Paul Durrant
Garrett D'Amore wrote:
 I'm looking for reviewers for the above fix.  The change is relatively 
 small, and should only take a couple of minutes to review.
 
 http://cr.opensolaris.org/~gdamore/mread/
 
 (I've verified this fix with Boomer audio  -- M_READ now gets properly 
 delivered even for nonblocking I/O.)
 

LGTM :-)

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Something to ponder for NIC administration

2009-01-19 Thread Paul Durrant
Andrew Gallatin wrote:
 
 But in general, I agree there should be some way to change what
 stateless offloads are in use at runtime.   Solaris is way behind
 here.  For example, the BSDs do it via ifconfig (to disable
 TSO on mxge0: ifconfig mxge0 -tso), Linux does it via the
 horribly cryptic ethtool (to disable TSO on eth2:
 ethtool -K eth2 tso off).
 

You now have dladm on Solaris for similar functionality. (And there's 
always ndd if you can't do a GLDv3 driver).

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Something to ponder for NIC administration

2009-01-19 Thread Paul Durrant
Andrew Gallatin wrote:
 You now have dladm on Solaris for similar functionality. (And there's 
 always ndd if you can't do a GLDv3 driver).
 
 Sure, but doesn't the driver need to have those features in place
 at attach time?  Or is there a way to get your m_getcapab() called
 again somehow?
 

That's true; I was not thinking of LSO specifically. Actually, I have 
not looked when m_getcapab() is called these days - it may be that you 
only have to unplumb/replumb the stack to enable TSO.

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Something to ponder for NIC administration

2009-01-19 Thread Paul Durrant
Min Miles Xu wrote:
 That's true; I was not thinking of LSO specifically. Actually, I 
 have 
 not looked when m_getcapab() is called these days - it may be that 
 you 
 only have to unplumb/replumb the stack to enable TSO.
 
 That may not be true. m_getcapab() seems not to be called again when the 
 interface is unplumb/plumb.

That sounds like a bug. The stack should really check each time it 
plumbs; or provide an interface for TSO enable/disable as Drew mentions 
that other OSes do.

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Something to ponder for NIC administration

2009-01-19 Thread Paul Durrant
Andrew Gallatin wrote:
 FWIW, unplub/plumb is still not much better than the current situation
 of removing/adding the driver.  Every other OS lets you change these
 settings on the fly, potentially without even taking the NIC down.
 (assuming the NIC hardware doesn't require a reset; and even if it
 does, that reset happens without administrator intervention).
 
 I think there needs to be a m_setcapap() or something similar
 to notify drivers of what capabilities are enabled.
 

That wouldn't be necessary; the m_setprop() entry point could be used to 
turn on LSO in the driver. There just needs to be a way of getting the 
stack to notice - as with an MTU change.

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] What is zero-copy safe

2008-12-18 Thread Paul Durrant
Sumit Gupta wrote:
 Inside gld_mac_info-gldm_capabilities, there is a flag defined as 
 GLD_CAP_ZEROCOPY. Doing some research, it seem to map to 
 DL_CAPAB_ZEROCOPY which means that the NIC driver is zero copy safe. 
 What is the meaning of zero copy safe ?
 

The meaning may have changed, but when this capability was introduced 
the meaning of it was that the driver guaranteed to complete transmits 
in a timely fashion.
The problem is that if you do a sendfile() and the network stack simply 
maps the file and wraps it in a set of STREAMS blocks then it must 
obviously hold a reference on that file; and if a driver holds onto the 
STREAMS blocks on its transmit side then it is essentally holding that 
file to ransom. So, if the driver exports the GLD_CAP_ZEROCOPY 
capability then it is essentially declaring that it is not going to hold 
  onto transmitted blocks for an unreasonable amount of time (i.e. 
significantly longer than it takes to put them on the wire) and thus 
will not hold a file to ransom in this way.

   Paul

___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] What is zero-copy safe

2008-12-18 Thread Paul Durrant
m...@bruningsystems.com wrote:
 In case you are unclear about the concept of zero copy, there is a paper 
 (very dated
 at this point) by Jerry Chu describing zero-copy tcp on solaris (2.6, I 
 believe) at:
 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.3960
 

Indeed. I discussed the implementation with Jerry since I was 
responsible for maintenance of the GLD code at the time. I am only 
unclear as to whether this capability has since acquired other 
interpretations.

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] why my gldv3 driver plumb fail?

2008-10-30 Thread Paul Durrant
Tom Chen wrote:
 
 Thanks! but could you tell me how to get a non-DEBUG kernel? When my 
 Solaris server boot up, a few seconds later, I see DEBUG on the 
 screen. But the Solaris Express that I can download at 
 http://opensolaris.org/os/downloads/ is always DEBUG kernel. How can I 
 disable the DEBUG mode? Is there any command to disable it? or should I 
 use non-debug ON BFU binaries to install ON bits?
 

If you're developing a driver, and you're not testing performance then 
you're probably better off sticking with a DEBUG kernel for now. DEBUG 
vs. non-DEBUG is not a boot option, it is an entirely different build of 
ON so, if you do want to use a non-DEBUG kernel, then BFUing using a 
non-DEBUG archive is probably the quickest option. I've never use the 
OpenSolaris distro though (I only use SXCE) so I'm not sure whether you 
can BFU it.

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] why my gldv3 driver plumb fail?

2008-10-29 Thread Paul Durrant
Tom Chen wrote:
 
 I am testing my gldv3 driver on the very latest OpenSolaris nv99 OS.
 I encountered an issue, my driver, qla, though can be successfully
 detected and loaded by OS, always fail to plumb.
 

Have you rebuilt your driver with nv99 headers? GLDv3 is not a stable 
interface and has probably changed incompatible since June/July.
If you have re-built, I suggest using dtrace to watch what occurs with 
your GLDv3 driver entry points; one of them is probably returning a 
failure code and thus plumbing is failing.
The detach following the abortive plumb is probably simply due to a 
modunload -i 0 or the background thread that runs on a DEBUG kernel.

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] why my gldv3 driver plumb fail?

2008-10-29 Thread Paul Durrant
Tom Chen wrote:
 
 I have installed ON bits on nv99 and thus I can build my GLDv3 driver. 
 What is the background thread that runs on a DEBUG kernel ? Could you 
 explain?
 

Check out:

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/modctl.c#3675
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] A error in using BFU to install ON

2008-10-27 Thread Paul Durrant
Peter Memishian wrote:
 
From a quick look at the bfu script, looks like there's a new environment
 variable you need to set:
 
   EXTRACT_HOSTID=/opt/onbld/bin/i386/extract_hostid
 

Why on earth cannot bfu be modified to search under its own 'bin' [*] 
directory rather than resorting to a SWAN-only server? The plethora of 
environment variable that one must set before invoking it off-SWAN is 
very annoying.

   Paul

[*] I.e. if it is invoked from /net/onnv.eng/export/onnv-gate/public/bin 
then use binaries under /net/onnv.eng/export/onnv-gate/public/bin/i386; 
otherwise if it is invoked from /opt/onbld/bin use binaries from 
/opt/onbld/bin/i386.

___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] bad LSO ill state..

2008-10-16 Thread Paul Durrant
Xiu-Yan Wang wrote:
 
 The ire entry is not maintained correctly during the plumb/unplumb
 process. The packet destined to your driver was sent out through bge
 interface and bge does not check the packet length and it tries to copy
 the large LSO packet to it's own buffer. 6586787 has been filed for
 this.
 

Probably explains why I didn't see it. My on-board network chips use the 
bnx driver rather then bge, which I'm guessing is better behaved (... 
and I can only guess because it is not open source :-()

   Paul


___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] bad LSO ill state..

2008-10-14 Thread Paul Durrant
Andrew Gallatin wrote:
 
 The plumb/unplumb tests in NICDRV are killing me..  I've got a strange
 memory corruption bug I'm still tracking that I may end up asking for
 advice on..
 

My only other problem with them was that they seem to expect the 
netperf/netserver processes to always succeed; which in the face of an 
unplumb isn't necessarily the case. I had no other driver problems.
If you're getting corruption in your driver then you may shed some light 
on it by setting kmem_flags to 0xf in your /etc/system file and 
rebooting before running your test.

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] bad LSO ill state..

2008-10-14 Thread Paul Durrant
Andrew Gallatin wrote:
 
 I should have checked this myself.  I'm seeing 5-10 packets where
 HW_LSO is set, but HCK_PARTIALCKSUM is not set when the interface
 is brought down.  This is  nonsensical for my NIC and driver..
 
 What did you end up doing to fix the problem?  I suppose it is harmless
 for you, since if you do full checksum, your NIC will just re-write
 the checksum anyway, and not corrupt the checksum like mine.
 

It was a problem for me because I try to honour the checksum bits so I 
ended up sending the LSO to a non-checksummed h/w queue and the driver 
then got very confused about getting an LSO packet on that queue. So, in 
the end I just decided these packets were nonsensical and dropped them.

 BTW, I also saw this via NICDRV.
 

Good; It wasn't just me then ;-)

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] bad LSO ill state..

2008-10-13 Thread Paul Durrant
Andrew Gallatin wrote:
 
 So.. Is this a bug in Solaris, or should I just hack the driver?
 

Is HCK_PARTIALCKSUM set in the packet flags of the LSO segments for 
which you see the problem, or are you getting large packets coming 
downstream with LSO set, but no checksum offload flags? I ask, because I 
think I saw something similar when I ran NICDRV.
My code relies on HCK_FULLCKSUM being set for LSO segments. It makes no 
sense for LSO segments to come downstream without some form of requested 
checksum offload.

   Paul

___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] ftp and ping together causing problems

2008-10-08 Thread Paul Durrant
Pradeep writes:
 I am presently developing a GLDv3 complaint Network driver ,
 iam facing a strange problem when a do a ftp and ping together iam seeing 
 following warning message in the ping program.
 ICMP Fragmentation needed and DF set from gateway 178.122.70.9 
for tcp from 178.122.70.9 to 178.122.70.56 port  34234

A classic reason for this would be if the stack were sending down LSO 
packets to your driver but your driver/hardware were not fragmenting 
them down to MTU size packets. I.e. you really are sending 64k (or 
whatever LSO limit you have) packets out on the wire.

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] please help = network configuration !!

2008-10-02 Thread Paul Durrant
onkar wrote:
 sol1:~# netstat -rn 
 
 Routing Table: IPv4
   Destination   Gateway   Flags  Ref Use Interface 
   - - -- - 
 127.0.0.1127.0.0.1UH1  8 lo0   
 
 Routing Table: IPv6
   Destination/MaskGateway   Flags Ref   UseIf 
   
 --- --- - --- --- 
 - 
 ::1 ::1 UH  1   0 lo0 
   
 
 
 [EMAIL PROTECTED]:~# ifconfig -a 
 lo0: flags=2001000849UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL mtu 8232 
 index 1
   inet 127.0.0.1 netmask ff00 
 nge0: flags=201000842BROADCAST,RUNNING,MULTICAST,IPv4,CoS mtu 1500 index 2
   inet 192.168.1.17 netmask ff00 broadcast 192.168.1.255
   ether 81:ea:66:a0:1a:0 
 lo0: flags=2002000849UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL mtu 8252 
 index 1
   inet6 ::1/128

Your nge interface is not 'up', which is why it does not appear in the 
routing table. Try 'ifconfig nge0 up'.

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] please help = network configuration !!

2008-10-02 Thread Paul Durrant
onkar wrote:
 # cat /etc/defaultrouter
 192.168.1.1
 # cat /etc/hostname.nge0 
 192.168.1.17
 
 still I am not able to ping to 192.168.1.1
 
 Waht else might be going wrong ?

Have you run 'snoop -d nge0' to see if you're getting packets through. 
At this stage I'd say the problem is likely to be with the physical link.

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Interesting = MAC addr reversing manually solved the problem !!

2008-10-02 Thread Paul Durrant
onkar wrote:
 I did this = 
 
 ifconfig nge0  ether   MAC addr reversed 
 ifconfig nge0 dhcp start
 

That sounds like your DHCP server is misconfigured.

 not I am able to ping 192.168.1.1 (router )
 
 but , I am not able to ping www.google.com
 

That sounds like your DNS server is missing or misconfgured.

In either case I don't think there is a problem with your OpenSolaris 
installation.
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] how can i just capture packet to my interface

2008-09-25 Thread Paul Durrant
smzlkimi wrote:
 I wrote a dlpi program to capture packet,it capture all packets to/from 
 my interface,How to just capture packets to my interface
 

It might be helpful if you posted some details of your program.

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] raw socket problem

2008-09-09 Thread Paul Durrant
smzlkimi wrote:
 I copied the code below from A brief programming tutorial in C for raw 
 sockets I try to send tcp packets to 192.168.0.155 http://192.168.0.155
 however It seems no packets were sent. because I run snoop in 
 192.168.0.155 http://192.168.0.155,only two  packet were captured (arp 
 ,who is 192.168...) no tcp packets were captured.
  
 why?

Were there any ARP responses? If the stack does not know who 
192.168.0.155 is then it will not send any TCP packets.

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] [driver-discuss] MII/GMII Support for mac_ether

2008-07-17 Thread Paul Durrant
Steven Stallion wrote:
 mac.h
 -
 
 I added an enum for link_speed_t; this was made since we already
 maintain the link_duplex_t and link_state_t. link_speed_t is very useful
 when setting a specific speed via MII.
 
 The IFSPEED macro was added as an afterthought; I find it generally
 annoying that the ifspeed kstat typically is reported using a magic
 multiple (100). *val = IFSPEED(foop-speed) is much clearer than
 *val = foop-speed * 100.
 
 mac_ether.h
 ---
 
 This contains the bulk of the changes. I've added what I thought the
 minimal MII/GMII interface should be. There are of course a number of
 features supported by MII which probably arent useful in the day-to-day
 (i.e. collision test, loopback, and so forth).
 

Do the mii)read/mii_write entry point names make sense? There don't seem 
to be enough arguments. The interface across which you talk to a 
MII/GMII PHY is usually clause 22 MDIO (and a driver could make it look 
like this even if it isn't), so you need function prototypes taking a 
context, a register offset and then a data buffer to read into or write 
from. For clause 22 the register space is 8 bits wide and the data 16 
bits wide. Also, I'd call them mdio22_read/mdio22_write to make it clear 
what their purpose is.
Once you get to PHYs talking XGMII or XFI you're likely to need clause 
45 MDIO and that requires an extra 'MMD' argument, which is 8 bits wide, 
and the register space increases to 16 bits wide.

Also, since the MII/GMII register space is an IEEE standard I'd prefer 
to see enumerations for the register offsets. As for 10G PHYs, each MMD 
has an IEEE register space ( 0x8000) and a vendor specific register 
space (= 0x8000) so you could also partially enumerate those.

Do you have implementations for the mii functions you have? Also be 
aware that not all link modes available through a given PHY will be 
supported by the MAC it is attached to (e.g. 1G half-duplex, which is 
not required by the IEEE IIRC) so you need to make sure the driver can 
veto things appropriately.

   Paul

___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] SO_WROFF for TCP connections

2008-06-19 Thread Paul Durrant
Andrew Gallatin wrote:
 I *think* the offset left at the front of mblks may be causing
 a performance problem in my gldv3 driver, and a way to zero
 out this offset (so mp-b_rptr in payload segments is right at
 the front of dblk) would help me see if this is true.

Is this a 64-byte alignment issue? I can't see why else having 
mp-b_rptr == dbp-db_base would have any effect.

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Regarding implementation of MSI-X interrupts In Solaris 10 8/07

2008-06-18 Thread Paul Durrant
Pradeep wrote:
 
 uname -a
 SunOS unknown 5.10 Generic_120012-14 i86pc i386 i86pc

There's your problem; you're using Solaris 10.

a) Solaris 10 does not support MSI-X on x86 AFAIK. (Others my wish to 
correct me here).
b) This is not a Solaris 10 discussion group, it's an OpenSolaris 
(Nevada) discussion group.

   Paul

___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] LRO Implementation.

2008-06-05 Thread Paul Durrant
Kacheong Poon wrote:
 As I said in a previous mail,
 the designer of LSO just thought that something
 never changes, which is simply false.


Why is that false? LSO is only supposed to be used in cases where things 
*aren't* changing... it's a common case optimization, not a panacea.

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] LRO Implementation.

2008-06-04 Thread Paul Durrant
Kacheong Poon wrote:
 As Jim stated, the question is whether we want to do
 the above given the already known problems.  For example,
 suppose TCP wants to do better PMTUd and wants to change
 the segment size on the fly.  In order to recover faster
 in case PMTU has not changed, it decides to send alternate
 small and big segments.  I think the above GLD LSO scheme
 will not allow this easily.

Why? If a the DB_LSO flag is not set then the driver/hardware must not 
fragment the segment. Hence the stack can send segments of any size it 
likes and not have the driver/hardware interfere.

   Paul

  TCP will need to do multiple
 sends just like today.  And I guess the above GLD LSO
 scheme still won't solve the issues I gave in my previous
 email.  So maybe we can just do the simple thing and forget
 about this GLD LSO thingy.  And just make the code path
 simple and quick enough.
 

___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] LRO Implementation.

2008-06-03 Thread Paul Durrant
Kacheong Poon wrote:
 
 For example, suppose we want to use the TCP MD5 option.  I
 think it will work nicely with MDT.  But I don't know if there
 is (or ever will be?) a hardware out there doing LSO which can
 fill in the MD5 option value.  As another example, MDT can
 support the ECN nonce in RFC 3168 nicely.  I doubt that there
 is a hardware out there doing LSO that can do ECN nonce.
 

Who says it's only h/w that implements LSO? It's perfectly reasonable to 
do the fragmentation in s/w; it still buys a lot of peformance.

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] LRO Implementation.

2008-06-03 Thread Paul Durrant
Kacheong Poon wrote:
 
 Well, I am not trying to say that MDT is the best design.
 But the way current LSO works does not give the transport
 protocol the level of control it needs to do those examples
 I gave in my previous mail.  What is your proposal to design
 a MDT like software LSO which gives the transport protocol
 this level of control?
 

I don't have one. I was merely pointing out that one should not be 
concerned about the intertia of h/w implementation if one wishes to try 
to extend LSO to cope with the examples that you stated.

   Paul

___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] LRO Implementation.

2008-06-03 Thread Paul Durrant
Kacheong Poon wrote:
 Paul Durrant wrote:
 
 Who says it's only h/w that implements LSO? It's perfectly reasonable 
 to do the fragmentation in s/w; it still buys a lot of peformance.
 
 Isn't this what MDT does?
 

Yes. The only difference, of course, is the message format. LSO 
fragmentation code is generally more portable between OSs; MDT is highly 
Solaris specific.

   Paul

___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] LRO Implementation.

2008-05-28 Thread Paul Durrant
Yunsong (Roamer) Lu wrote:
 LRO is not yet supported in the stack, and there will be some other 
 issues than the b_cont check in IP.
 
 But experimentally LRO implementation in driver works after fixing the 
 check against maximum 2-block of message. You may apply this *hack* to 
 try your LRO implementation if you can compile IP module for your 
 Solaris version.
 

Roamer,

There's also the mblk_pull_len issue (there is a bug for this but I 
can't find the number right now). At the moment it needs to be set to 
zero to avoid unnecessary copying in the stack.
BTW, I've tested my LRO implementation with netperf on my driver. It 
does not affect performance of a single stream test (I can get 10G line 
rate regardless) but the CPU idle % on the netserver end (on the 
interrupted CPU) goes from ~5% to ~50% i.e. LRO saves nearly half a core 
of CPU time. Understandably then, it's quite important to getting good 
performance on multi-stream tests.
Any word on when we may see these stack issues fixed?

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Packet reassembly

2008-04-29 Thread Paul Durrant
Jason King wrote:
 The driver I'm working on (cge) supports hw reassembly of recognized
 flows (I think this might be considered LRO, but I'm not sure -- I'm
 relatively new at this).  Directly from the driver, you get a buffer
 for each header of the packet, then the data portion for all those
 packets (of the same flow) assembled into a single contiguous buffer.
 The obvious thing is to create an mblk chain, with 2 fragments for
 each packet (1 for the header, 1 for the data), and then pass the
 entire chain up in a single mac_rx() call.  Since the data is
 contiguous, are the upper layers (i.e. tcp/ip) able to detect the
 condition and take advantage of the situation, or does the driver need
 to do more work (such as collapsing it down to a single mblk, possibly
 with 2 fragments), or does it require some sort of signal/flag to
 indicate such a condition?

No, you should be ok with 2 data blocks split between header and data. 
Be aware that if you split into more than 2 blocks then you'll need to 
patch the ip module to avoid hitting a slow path. (See the recent thread 
on LRO).

   Paul

___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] MII/GMII Kernel Module

2008-04-22 Thread Paul Durrant
Garrett D'Amore wrote:
 
 IMO, plenty of value.  Though the number of new drivers that could take 
 advantage of it is probably not too great -- not too many new 100M/1G 
 chips are entering the market now -- everything seems to be focused on 
 10G for new development.
 

I think there's be plenty of value in some common XGMII code; 
particularly for formulating and gathering stats. There are well laid 
down standards for MMD register sets and the PHYs I've met stick to them 
pretty well. As for common XGMII reset or init. code, I'd say there 
would be less value; it can be very PHY specific.

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Seeing lots of flow control pauses

2008-04-17 Thread Paul Durrant
Matty wrote:
 
 We have a couple of Nevada build 81 hosts (Sun X2200s) plugged into a
 Cisco 4948. While perusing the switch counters, we noticed that the
 RxPause column for the switch ports connected to these hosts is
 non-zero:
 
 as101#show flowcontrol
 Port   Send FlowControl  Receive FlowControl  RxPause TxPause
adminoper adminoper
 -     --- ---
 Gi1/5  off  off  desired  on  849066  0
 .
 Gi1/11 off  off  desired  on  1299244 0
 
 Does anyone happen to know what conditions will lead a Solaris host to
 send RX pause messages? I have been googling and reading through
 various pieces of documentation, but nothing appears to describe the
 system or application behaviors that can lead to non-zero RxPause
 values.
 

Pause messages are usually generated directly by the network adapter 
with s/w usually only getting involved to set some FIFO thresholds. The 
likelihood is then that whatever traffic is going to the X2200 is 
causing some hardware FIFO to fill beyond some threshold and thus pause 
packets are being sent back to the switch. What hardware are you 
interfacing to in the X2200 (it's not a machine I'm familiar with)?

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Large receive packets (LRO)

2008-04-17 Thread Paul Durrant
Yunsong (Roamer) Lu wrote:
 Please file a CR to track it. I guess you're preparing to integrate LRO 
 support in xge, if so you may putback this fix together. It's not really 
 an official support for LRO, but a simple bugfix that remove the barrier 
 in IP.
 

Could you *please* let me have the source patch too? I'd like to patch 
it into my tree so I can do my LRO development and testing. Thanks,

   Paul

___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Large receive packets (LRO)

2008-04-16 Thread Paul Durrant
Xiu-Yan Wang wrote:
 
 Roamer Lu has a prototype fix for this limitation. I or himself can
 provide the fix if you want.
 

Cool :-) Please send me the patch. I'd like to test it out. Any idea of 
the integration schedule for this fix? Do you have a CR number?

   Paul

___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Large receive packets (LRO)

2008-04-16 Thread Paul Durrant
Xiu-Yan Wang wrote:
 
 The fix need to regenerate the ip module. Please let me know the build
 of Solaris/OpenSolaris that you work on so I can generate the binary for
 you.
 

I'm BFUed to snv_86

 There is no plan to integrate the fix yet as it has not been tested
 throughly. But it maybe occur in the near future if you and Drew both
 are happy with the fix. :-) There is no CR filed for this.
 

Could you file a CR? I think both Drew and I would *really* like this to 
be fixed.
If you can send us the source patch it would make life easier as I can 
integrate it into my mercurial gate and re-build my own IP module; and I 
can also review the code for you :-)

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Large receive packets (LRO)

2008-04-16 Thread Paul Durrant
Andrew Gallatin wrote:
 Once the main obstacle to LRO is cleared by this patch, what is the
 general opinion of LRO?
 

Drew,

   As you point out, even when you can get line rate without LRO the CPU 
load is massive. There are plenty of pitfalls in an LRO implementation 
but if it's done correctly then I view it as a big win.
   Clearly, as you say, it's not useful if packets are being forwarded; 
but this will be true for the majority of boxes. For safety though, I 
think LRO should be a driver option that defaults to 'off'. With stack 
support (i.e. a down call that can tell us if an interface is being used 
for forwarding or not) then we could turn it 'on' by default.

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Large receive packets (LRO)

2008-04-16 Thread Paul Durrant
Garrett D'Amore wrote:
 
 Stack support would be a good thing here, although I suppose it sort 
 of pollutes the nice layering between layer 2 and layer 3 that the 
 current Nemo/IP stack has.
 
 I'm moderately opposed to a manual tunable, if at all possible to do 
 without ... these become call generators, and require extra 
 documentation, training, etc.
 

For the sake of expediency I'd prefer the IP bug to be fixed, thus 
allowing ad hoc per-driver LRO tweakables in the first instance to be 
followed (hopefully quickly) by an LRO on/off switch courtesy of GLDv3, 
and then possibly some generic LRO inside GLDv3.

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


[networking-discuss] Large receive packets (LRO)

2008-04-15 Thread Paul Durrant
Does anyone who's more familiar than I with the current workings of the 
opensolaris IP stack know of any reason why it would not be able to 
accept TCP packets MTU on the receive side? I'm trying to assess 
whether it's worth putting LRO support into my driver.
If no-one has a 'this will definitely not work' answer then I'll give it 
a go ;-)

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Large receive packets (LRO)

2008-04-15 Thread Paul Durrant
Andrew Gallatin wrote:
 Paul Durrant writes:
   Does anyone who's more familiar than I with the current workings of the 
   opensolaris IP stack know of any reason why it would not be able to 
   accept TCP packets MTU on the receive side? I'm trying to assess 
   whether it's worth putting LRO support into my driver.
   If no-one has a 'this will definitely not work' answer then I'll give it 
   a go ;-)
 
 On at least some versions of Solaris, if a packet has more than some
 small number (2?)  of mblks chained together, the stack takes a slow
 path.  This is why I never did LRO in our Solaris driver.  If this is
 fixed in some version of Solaris or OpenSolaris, I'd love to hear
 about it.
 

I'll wade through the code and see if I can find this slow-path. Ta.

 FWIW, Solaris was the first OS where I ever saw line rate receive
 without LRO.
 

At MTU=1500? I haven't tried tuning my b/w up to the max. yet but doing 
a bit of profiling suggests the per-packet conn lookup is being hit a 
lot - hence my desire to pass up larger packets.

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Large receive packets (LRO)

2008-04-15 Thread Paul Durrant
Mike Gerdts wrote:
 
 Sure... with a bit of TCP tuning (which should be default...) I have
 done about 900+ megabits per second with iSCSI sequential reads tests.

megabits? I think we're talking line rate at 10G ;-)

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Paging in network driver

2008-04-08 Thread Paul Durrant
Pradeepg wrote:
 Is it possible to implement page mechanism in Network drivers 
 for allocating DMA buffers ( instead of using ddi functions to 
 allocate DMA buffers) .

Can you elaborate? What do you mean by a 'page mechanism'?

___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] dladm - command line syntax scaling...

2008-03-14 Thread Paul Durrant
Darren Reed wrote:
 
 Thoughts?
 

We thought about such things before Nemo was delivered into ON and may 
have even had a version of dladm with a CLI as you suggest. However such 
a CLI did not appear to meet the CLIP rules as laid down in the PSARC 
case (can't remember the case # off the top of my head) and so we have 
the present CLI.
Personally I don't think a general create verb would lessen the 
documentation required for introducing a new kind of object, nor would 
it necessarily simplify the CL parsing required in dladm. A flat 
verb-object namespace is probably simpler to extend.

   Paul
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] I/OAT

2007-11-15 Thread Paul Durrant
On 15/11/2007, Andrew Gallatin [EMAIL PROTECTED] wrote:
 On Windows, which has an option
 to avoid polling, we saw a decrease in CPU overhead.


That's interesting. Which option is that? I guess Windows APIs are
generally better suited because of their asynchronous nature. Some
Solaris functions also have asynchronous variants, but I'm not sure
how much they are used.

  Paul

-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Is mblk_t now our biggest enemy?

2007-09-20 Thread Paul Durrant
On 19/09/2007, Erik Nordmark [EMAIL PROTECTED] wrote:

 If the GLD framework causes msgdsize to be called multiple times for the
 same packet, then it might make sense to expand on the GLD interfaces to
 pass the packet length along.


If the nemo TX entry point it only ever called with one packet at a
time then clearly, adding the packet length as an argument would be
easy to do; otherwise you're looking at attaching packet metadata to
the mblk and that's a whole other can of worms.

BTW; if the nemo tx entry point really *is* only ever called with one
packet at a time then the drivers TX code could be significantly
simplified (since they have to assume a packet chain at the moment).

  Paul

-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] sendfile performance regression due to esballoca

2007-09-20 Thread Paul Durrant
On 20/09/2007, Andrew Gallatin [EMAIL PROTECTED] wrote:

 I'm pretty sure I've traced this to revision 2994 of
 ./src/uts/common/fs/sockfs/socksyscalls.c where the mblk for the data
 is now allocated via esballoca rather than desballoca.  According to
 the logs, this was to fix a recursive mutex enter in the ce driver
 (bug 6459866).

 Rather than take a hammer to sendfile, wouldn't the correct fix have
 been to change the ce driver so that it didn't hold its tx mutex while
 freeing mblks?


Indeed. Since ce is only a 1G part, and this poor 'workaround' is
hitting 10G performance, ce should really be fixed properly. Is ce in
ON these days? I've not looked.

  Paul

-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] nemo mac_notify change (webrev)

2007-09-09 Thread Paul Durrant
On 09/09/2007, Peter Memishian [EMAIL PROTECTED] wrote:

 I presume you mean mac_unregister() above?  When I was originally talking
 through this problem with Seb (which led to mac_condemn()), my contention
 was that the very notion that a destructive operation can fail represents
 a design flaw (indeed, we have many of these in Unix -- close(2) being the
 most notable).  The introduction of mac_condemn() should make it possible
 to ensure that mac_unregister() *will* not fail (short of passing it bogus
 arguments or other minutia), thus eliminating any need to worry about
 undoing partial teardowns.


The general idea behind mac_unregister() and the fact that it can fail
is that it should be the first thing called by a driver's detach(9e)
entry point. If it fails the detach() bombs out and nothing further is
done. (This is similar to the check that DLPI drivers need to do for
open streams, although qattach may have done away with the need for
that check).

  Paul

-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] GLDv3 blanking

2007-09-07 Thread Paul Durrant
On 07/09/2007, Andrew Gallatin [EMAIL PROTECTED] wrote:

 I'm still curious, even if only for academic reasons.  What unit of
 time is the time_t in?  How would one convert it to microseconds?
 I guess it is really academic at this point, since there are no
 callers (and if there were, I could look at the caller to figure
 this out..).


The units are whatever you make them actually ;-) In the
mac_resource_t structure you can specify a time. Multipliers are
applied to this value and then it is passed to blank function so there
are no absolutes.
BTW, it can be called! Check out the SQS_POLLING_ON() and
SQS_POLLING_OFF() macros. The function registration path obfuscates
the calls, but they *are* there (and I've seen them happen, albeit on
a non-current build of Nevada).

  Paul

-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] nemo mac_notify change (webrev)

2007-09-07 Thread Paul Durrant
On 07/09/2007, Sebastien Roy [EMAIL PROTECTED] wrote:
  FWIW, I find the current mac_register_t scheme ugly; I don't see why a
  statically defined mac_register_t cannot simply be passed into
  mac_register(). I don't see why we need separate allocator functions
  there.

 It's done this way so that the structure can be expanded without
 affecting binary compatibility.  We can add a new field to the structure,
 and the system won't crash and burn when an existing driver registers.
 The framework will allocate an appropriately sized structure, but the
 older driver simply won't fill in the new fields it doesn't know about.


Agreed, but you can still do that by the driver exporting a version
field in a statically defined structure. The framework just parses the
versioned structure and copies relevant fields into the mac_impl_t. No
need for the extra function calls.

  I also don't see that you can't break this race using reference
  counting. There's no need for mac_unregister() to actually free the
  mac_impl_t; it could just drop a ref. and the last man out frees.
  Let's try not to complicate the interface because of implementation
  artifacts.

 I don't believe a reference counting scheme makes a difference in the
 scenario that Thiru described.  He's describing a scenario where the
 mac_impl_t is freed by mac_unregister() before another thread gets a
 chance to examine that mac_impl_t to verify its contents (either for a
 flag or a reference count, it doesn't matter, the end result is still the
 same; boom!)


Ok. I misunderstood the argument.

 IMO, this case is borderline a driver bug.  The driver has called some
 MAC function (which hasn't returned yet), and decides to call
 mac_unregister().  mac_unregister() is documented to invalidate the MAC
 handle, so drivers shouldn't knowingly call mac_unregister() while using
 that handle, or use the handle during or after calls to mac_unregister().

 To me, this is a similar problem to calling free() on a buffer that an
 application knows that it's still using...  Synchronization on the
 allocation and deallocation of that buffer can't be done by the memory
 framework, it has to be done by the application.  The framework can't
 possibly know what the application is going to do with that buffer once
 the free() function has been called.


Absolutely agree. If a driver is stupid enough to mac_unregister()
whilst there are outstanding threads up-calling then it deserves all
it gets!

  Paul

-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] nemo mac_notify change (webrev)

2007-09-07 Thread Paul Durrant
On 07/09/2007, Garrett D'Amore [EMAIL PROTECTED] wrote:

 One more point.  I would prefer adding some complexity to the interface,
 if such complexity (maybe a couple extra function calls) can save a
 great deal of implementation effort in the consumers of the interface.

Well, it's not me who has to document the interface (this time) so I
guess that's up to you ;-)

  Paul

-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] nemo mac_notify change (webrev)

2007-09-07 Thread Paul Durrant
On 07/09/2007, Garrett D'Amore [EMAIL PROTECTED] wrote:

 There are further complexities, because the driver cannot know if any
 upstream callers are still busy until it calls mac_unregister().  And
 then, because mac_unregister *also* frees the structure, its too late.


You mean upstream callers that have done mac_open()? That should take
a ref. on the mac_impl_t.

  Paul

-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Is it time to break up networking/approach community?

2007-09-07 Thread Paul Durrant
On 06/09/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 For OpenSolaris, layer 2 networking is about:
 - the mac/dls interfaces underneath IP that device drivers use
 - link layer protocols (802.1X, LACP, LLDP)
 and the related management problems.


Whilst there's clearly a lot of activity at layer 2 (and about time)
I'm not entirely sure the layer model is clean enough to separate
discussion of DLS and IP; particularly with crossbow under active
development.

  Paul

-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] nemo mac_notify change (webrev)

2007-09-07 Thread Paul Durrant
On 06/09/07, Garrett D'Amore [EMAIL PROTECTED] wrote:
 (Obviously mac_alloc() here would return mac_t *, but under the covers
 it is really allocating a whole mac_impl_t.  And mac_free() would be
 free()'ing a a whole mac_impl_t, though it is taking a mac_t as its
 argument.)

 If we want to pursue this course of action, I can put together a webrev
 with the changes... it will take me a while because there are a *lot* of
 drivers to change... but I want to get this change in *before* we make
 GLDv3 public.

FWIW, I find the current mac_register_t scheme ugly; I don't see why a
statically defined mac_register_t cannot simply be passed into
mac_register(). I don't see why we need separate allocator functions
there.
I also don't see that you can't break this race using reference
counting. There's no need for mac_unregister() to actually free the
mac_impl_t; it could just drop a ref. and the last man out frees.
Let's try not to complicate the interface because of implementation
artifacts.

  Paul

-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] TCP/IP Plumbing Evolution

2007-08-30 Thread Paul Durrant
On 30/08/2007, Sebastien Roy [EMAIL PROTECTED] wrote:
 That said, there are some things that Nemo doesn't do today that would
 prevent its replacing DLPI.  The DL_NOTIFY_* and DL_CAPABILITY_*
 mechanisms are two I'm thinking of, and there may be others.  There are
 equivalents at the Nemo MAC-layer, but nothing at the link-layer.  Making
 Nemo ubiquitous is only part of the problem that needs to be solved to
 push DLPI out of this particular scene.


Agreed, but it's a worthy goal IMHO :-)

  Paul

-- 
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] on the notion of loanup

2007-06-29 Thread Paul Durrant

On 28/06/07, Garrett D'Amore [EMAIL PROTECTED] wrote:


We do not use posting of buffers from application space.  That's not
how our network stack operates.



...until someone implements extended sockets ;-)

 Paul

--
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Re: Require guidance in PFIL and binding to the required interface dynamica

2007-06-14 Thread Paul Durrant

On 13/06/07, Lenin [EMAIL PROTECTED] wrote:


If you analyze the dtrace output, i could see that the IPHeader is printed for 
v4 packets, and it's not printed for v6 packets [ check ether type 86dd ]. Does 
this mean that this packet is silently droppped by the interface during 
bge_send ? Please clarify!


No. If you read the dtrace manual you will see that you're hooking the
function entry point, so the function has not run yet and hence could
not have discarded the packet.
Clearly, since your unpredicated probe fires and you predicated one
doesn't the problem must be the predicate!

 Paul

--
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Re: how to use kmdb?

2007-06-08 Thread Paul Durrant

On 07/06/07, Tom Chen [EMAIL PROTECTED] wrote:

I thought kmdb is just like gdb of Linux which displays source code. Even 
windows kernel debugger can display too.



Now that would just be too easy ;-)

 Paul

--
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Re: Manipulating mblk_t !

2007-06-07 Thread Paul Durrant

On 06/06/07, Lenin [EMAIL PROTECTED] wrote:

Here's the code in question :

V6 packet creation goes here :

if ((m = (mblk_t *)allocb(40, BPRI_HI)) == NULL)
cmn_err(CE_NOTE, OOM - can't create v6 pkt);

mt-b_rptr +=off;


What is 'off'?

Be aware that your original packet may not be in a single dblk_t. You
probably need to do a pullupmsg(m, off + iphlen) to be sure you've
enough data in the leading dblk_t.

 Paul

--
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] how to use kmdb?

2007-06-07 Thread Paul Durrant

On 06/06/07, Tom Chen [EMAIL PROTECTED] wrote:


I use ::bp qla_gld_intr to set a breakpoint and later it really stops at there. However 
it looks like it steps through assembly code, how can I see the C souce code and how to 
check value of local/global flags? I tried to google kmdb tutorial, but did 
not find.



Neither mdb nor kmdb are source-level debuggers. You'll need to figure
out which registers local variables are in at your breakpoint by using
a disassembly of the function. Globals you should be able to reference
by name though.

 Paul

--
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Manipulating mblk_t !

2007-06-06 Thread Paul Durrant

On 06/06/07, Lenin [EMAIL PROTECTED] wrote:

How do i remove the ipv4 header from mblk_t and replace it with, say for 
example, a new ipv6 header ?

I tried creating a new mblk_t { allocb } and then created v6 header
first, then retrieved the tcp header from the original mblk_t and
attached to the v6 header. Did a putnext after freeing the original mblk_t.
Is this approach correct ? I don't seem to find this working. I can
only see the kernel crash.


This was possibly due to freeing the original mblk_t.

I suggest you move the b_rptr of the original IPv4 block to point to
the TCP header. (I assume that you have a single mblk_t), Then
allocate a new mblk_t to contain your IPv6 header and link it before
the original mblk_t. Now you have a pair containing a v6 packet.
Alternatively, if you don't care about losing the original v4 header
you could simply try overwriting it in the original.

 Paul

--
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Re: Device driver Debugger

2007-06-05 Thread Paul Durrant

On 05/06/07, Tom Chen [EMAIL PROTECTED] wrote:

Thanks! but it looks like [b]mdb -KF[/b] is entered on another Solaris machine. 
I only have a windows PC and a Linux PC, have u ever tried to use windows/Linux 
to debug Solaris server?


Yes. You just need serial console set up on the Solaris end (add
-Bconsole=ttya to your grub line) and a terminal set to 9600,8,n,1 on
the client. I've used minicom quite successfully on Linux.

 Paul

--
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] simple codereview needed (P1 fix)

2007-06-05 Thread Paul Durrant

On 05/06/07, James Carlson [EMAIL PROTECTED] wrote:

Garrett D'Amore writes:
 I need 2 reviewers for the following fix, which addresses a P1 bug in
 DLS (where alignment is not checked).

 The bug fixed is 6557249 DLS layer now expects the IP addresses to be
 aligned

I thought meem had commented on this one at some point -- asking
whether fixing it here was really the right thing to do.  It means
that we leave a lot of performance on the table with drivers that are
persistently doing the wrong thing.  The original designers here, as
best I can tell, were intentionally _assuming_ that the driver would
do something sane to provide correct alignment.



Indeed. IIRC it used to be the case that the old IP fast-path would
ASSERT in a DEBUG kernel that the IP header was 32-bit aligned.


So, should this be fixed or should the driver that provoked the panic
be fixed?



I'd say the driver should be fixed otherwise a suboptimal codepath is
being hidden. It doesn't seem unreasonable to demand that all Nemo
drivers should pass up packets such that their payload (not
necessarily IP) is 32-bit aligned; although such alignment would only
really be necessary on sparc since x86 platforms have no problem with
unaligned accesses.

 Paul

--
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] How to retrieve IP address from mblk_t

2007-05-24 Thread Paul Durrant

On 24/05/07, Lenin [EMAIL PROTECTED] wrote:

Am implementing a packet filter in solaris 9. Am able to create a STREAMS 
module and insert it between the IP and NIC. Am able to see the upstream and 
downstream messages flowing through my module.

How do i extract the IP addresses from mblk_t in case of downstream messages ? 
I need to extact the destination IP address and do some processing based on 
that. Can some one throw me some light on how to get this done ?



Well, you need to be aware of DLPI and also the IP fast-path
mechanisms that are used to send packets down. From IP you should be
seeing packets in fast-path format, that is simple chains of M_DATA
blocks (with no M_PROTO on the front).
Assuming you're using an ethernet NIC and you're not using a VLAN then
you simply need to skip the 14 byte ethernet header to get at the IP
packet.

 Paul

--
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] draft case for link status logging changes (GLDv3)

2007-05-24 Thread Paul Durrant

On 24/05/07, Garrett D'Amore [EMAIL PROTECTED] wrote:


b) ease of implementation.  i can print the link up/down state
without having to inquire the details from the driver, which avoids a
potential recursive lock situation if the driver happens to be careless
with  the context from which it calls mac_link_update().

In the future, it may be a nice thing to provide media-specific extended
information in mac_link_update(), such as speed, duplex (for 802.3),
ssid/bssid (for 802.11) etc.



Given that Nemo is still private it may be best to ensure that
mac_link_update() *is* called from a sensible locking context for all
drivers then you *can* enquire about other link info.
Personally I prefer information to be available from a single
definitive source. Thus I would prefer that mac_link_update() actually
carries *no* information about link state at all; it merely says that
a client needs to go and check the link because something happened.
All relevant link information is available from the mac_stat_get()
call so this can be used to retrieve relevant information.

 Paul

--
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] draft case for link status logging changes (GLDv3)

2007-05-24 Thread Paul Durrant

On 24/05/07, Garrett D'Amore [EMAIL PROTECTED] wrote:


That's the way it is implemented in my workspace right now its just
that the logging functionality doesn't provide anything other than the
the up/down notification.  Details have to be retrieved from dladm or
kstats.



Indeed, and I think up/down is too much information. That should also
be retrieved from stats.

 Paul

--
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Connecting Solaris 10 to My Company LAN

2007-05-23 Thread Paul Durrant

On 23/05/07, Musa Mohammed [EMAIL PROTECTED] wrote:

Please, can someone help out. I just installed Solaris 10
on my laptop. But the problem is I can't get the sound and
my network card to work. I tried using
sys-unconfigure
It did not show any screen where I can enter the networking
details. Also tried the ifconfig command, the interface could not
be detected.



Sounds like you don't have a driver for your network device. What is it?

[Also, this alias is for discussion of OpenSolaris (Nevada), not s10.
If you want s10 support then you need to go to Sun].

 Paul

--
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] desballoc buffer return issue

2007-05-16 Thread Paul Durrant

On 16/05/07, Tom Chen [EMAIL PROTECTED] wrote:


If a packet is saved in two mblk_t* structure, mp1 and mp2. If I use 
[b]linkb[/b][b](mp1,mp2)[/b] to link them together, and then use [b]gld_recv( 
mp1)[/b] to send the packet to OS, OS can recognize this packet. I think both 
buffers should be returned later from OS afterwards. However, i find only mp2 
or mp1 buffer is returned back to driver.

Is there any way to make OS return all linked mblk_t buffers?



The OS should return the buffers. Are mp1 and mp2 both created using
desballoc()?

The frtn_t-free_func is called whenever the ref. count on the dblk of
a desbcalloc-ed buffer reaches zero during a freeb(). If a
desballoc-ed buffer is bot being returned then it implies it is not
being freed. If this is the case then, if you run with kmem_flags set
to 0xf then you should be able to spot the leak using ::findleaks from
mdb run on a core dump of your system.

 Paul

--
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Re: Re: [osol-discuss] Nemo project proposal

2007-05-16 Thread Paul Durrant

On 15/05/07, John Galloway [EMAIL PROTECTED] wrote:

Ah thanks, I get it.  We have it plugged into an unmanged swtich.  When you 
talk about turning LACP on or off where is this done wrt Solaris?


See dladm(1m); the --lacp-mode option

 Paul

--
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] Solaris VLAN support

2007-05-03 Thread Paul Durrant

On 5/2/07, Tom Chen [EMAIL PROTECTED] wrote:


I am now writing a giga ethernet driver, currently a GLDV2 driver. I am thiking 
about whether I should enable VLAN for my GLDv2 driver or upgrade to GLDv3 
first then enable VLAN. I found some GLDV2 drivers use [b]gldm_send_tagged[/b]  
to send VLAN packets, but most GLDV2 do not support VLAN.
I couldnot find any man page for this API. Is VLAN not fully supported at 
GLDv2? I am wondering the VLAN capabilities(what can it do, what can not) in 
GLDV2 and GLDV3. Any improvement in GLDV3?

Can someone give an explaination?



GLDv2 supports VLANs to a limited extent. The work was originally done
for the bge driver since the BCM5794 chip that it was originally
written for can do tag insertion/strip in hardware. Hence the
gldm_send_tagged() entry point simply supplies the VLAN tag in the
expectation that the driver just passes it through to the h/w. This
interface is not documented as it was never intended to be used by IHV
GLDv2 drivers.
Nemo was written with the intention of providing VLAN and link
aggregation support for all drivers so, if you write a Nemo driver,
the framework will handle tag insertion/strip for you and you will be
able to aggregate your links. I get the impression that VLAN support
in Nemo is in a state of flux but, not being a Sun insider, I don't
actually know what's happening and when. I'm fairly sure, though, that
the changes will be fairly well, if not totally, hidden from any Nemo
drivers. The only downside to writing a Nemo driver is that the
interface is *still* not public so you're driver is tied into ON
builds.

 Paul

--
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] manual pages for device drivers

2007-04-27 Thread Paul Durrant

On 4/26/07, Garrett D'Amore [EMAIL PROTECTED] wrote:

In reviewing a few different manual pages for various ethernet device
drivers, I'm finding a few things that I think are a bit undesirable.

First off, nearly all of the manual pages repeat the same information
about DLPI details that is common to all ethernet drivers.  I'd really,
really like to have this information in a common manual page for
ethernet.



Most of the man pages (like a lot of the DLPI drivers) are cut'n'paste
jobs I believe so you're going to find a lot of repeated info. What we
probably need is some sort of cascading reference..

driver manpage talks about only what *it* supports (e.g. ioctls/ndd
params that are not common) and then refers to 'ethernet' page
ethernet manpage talks about ndd params/ioctls etc. common to all
ethernet drivers and then refers to DLPI page
DLPI page talks about DLPI providers, styles, device nodes, Sun extensions, etc.


Also, a lot of them repeat the same ndd information that is available as
part of the ieee802.3 man page.



In that case I guess maybe we don't want an ethernet page, maybe the
802.3 page should just be expanded to cover what's needed over and
above ndd. BTW, is some weight now going behind ndd? A couple of years
ago it was supposed to be being phased out and replaced by something
better (something perhaps integrated into the SMF). I've long
advocated improving the ddi_prop_op interface into drivers such that
'ndd' params and driver properties could be unified.

 Paul

--
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] manual pages for device drivers

2007-04-27 Thread Paul Durrant

On 4/27/07, Sebastien Roy [EMAIL PROTECTED] wrote:

Paul Durrant wrote:
  BTW, is some weight now going behind ndd? A couple of years
 ago it was supposed to be being phased out and replaced by something
 better (something perhaps integrated into the SMF).

There happens to be a project doing just that:

http://www.opensolaris.org/os/project/brussels/



Any design doc./info. I can look at? The requirements doc. doesn't
give much away.

 Paul

--
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


Re: [networking-discuss] guidance for link state debouncing?

2007-04-26 Thread Paul Durrant

On 4/25/07, Garrett D'Amore [EMAIL PROTECTED] wrote:


I was actually thinking that this form of debounce maybe should be
handled automatically by nemo.  I suspect it would simplify a lot of
drivers that have their own code to debounce these notifications.



Absolutely. Second guessing what IPMP wants is *not* the job of every
device driver writer out there.

 Paul

--
Paul Durrant
http://www.linkedin.com/in/pdurrant
___
networking-discuss mailing list
networking-discuss@opensolaris.org


  1   2   >