To add to Nick's comments: This fix also made it to latest kernel patch 138889-03 (for x86) and 138888-03 (for sparc), for S10.
These patches came out this month. I personally haven't tried it out, but i think it would be worth giving it a try to see if it helps. If someone already has tried that patch (or does try it as a result of this e-mail thread), appreciate if you can share your experience. Thanks, -ashu Nicholas Solter wrote: > ???? ??? wrote: >> Hi Ashu, >> I see the bug report >> http://bugs.opensolaris.org/view_bug.do?bug_id=6616644 that was >> reported about the same problem. >> Was it fixed and in which version of Solaris ? > > Leon, > > It looks like that bug was closed as a duplicate of 6529822, which was > fixed in build 88 of ON. > > I don't know anything more than that, though. > > Thanks, > Nick > >> Thanks, >> -- Leon >> >> On Thu, Nov 1, 2007 at 11:57 PM, Ashutosh Tripathi >> <Ashutosh.Tripathi at sun.com> wrote: >>> Hi Leon, Jacob, >>> >>> Both of you asked for more details about clhbsndr. >>> >>> Cluster membership is a very detailed and complex topic and >>> i would encourage you to participate in a Open HA Cluster user group >>> meeting happening near you (or Sun Tech Days where SC presents >>> too), to learn more in depth about it. You can meet face2face with >>> cluster engineers for detailed back and forth technical discussions. >>> >>> On the e-mail, i can only go so deep, but i would try >>> to answer your questions, at least at a high level. Please >>> see below. >>> >>> Leon Koll wrote: >>>> Hi Ashu, >>>> >>>> thank you for your efforts. >>>> Two questions: >>>> 1.the clhbsndr module is undocumented, that's why I am asking: >>>> why the cluster needs it on PUBLIC interfaces? >>>> My guess - it's not needed there but it's much easier to push it to >>>> all interfaces than to find the private ones and to push it to their >>>> stack. >>> It is needed in some situations. Particularly in Solaris9 where >>> network interrupts coming in on the public network can interfere with >>> cluster heartbeats. clhbsndr modules helps in such situations. >>> >>>> 2.Another problem that we saw is : the nxge interface is not in >>>> /etc/iu.ap file. Looks like a resurrection of 5-years-old 4643340 bug. >>>> How the cluster works with private interconnect on nxge's without the >>>> clhbsndr module? >>> On the private interconnects, a different mechanism is used as >>> the Cluster framework controls the plumbing and setup of the network >>> stack. For the public network, update to iu.ap file is needed, as >>> you have found out in 4643340. >>> >>> Always consult SC support matrix for questions about specific >>> hardware support. >>> >>> Jacob wrote: >>>> Do you have an estimate on official statement/release? >>> I don't have an estimate right now. I hesitate to speculate >>> on where this would go. One of my colleagues alerted me to the fact that >>> there is an ongoing Escalation on this issue, so i would just say that >>> rest assured that SUN is looking at this as a high priority issue. >>> >>> HTH, >>> -ashu >>> >>> >>> >>>> On 11/1/07, Ashutosh Tripathi <Ashutosh.Tripathi at sun.com> wrote: >>>>> Hi Leon, >>>>> >>>>> Thanks for getting back to us on this. >>>>> >>>>> We are still analyzing the issue and are not sure yet if the >>>>> problem is with the clhbsndr module, its interactions with >>>>> the Solaris STREAMS framework, or something else entirely. >>>>> >>>>> While we are working on an official statement on this. I would >>>>> suggest the following unofficial approach in the meantime. >>>>> >>>>> Go ahead and run without the clhbsndr module on the >>>>> cluster public interface, but beware that in case you log a >>>>> support call on this cluster, particularly if it is related >>>>> to cluster membership and heartbeats, the cluster support >>>>> personnel may request you to reproduce the issue without >>>>> this interim fix. >>>>> >>>>> Hope that answers your questions, >>>>> >>>>> Best Regards, >>>>> -ashu >>>>> >>>>> >>>>> Leon Koll wrote: >>>>>> Hi Ashu, >>>>>> >>>>>> I am working with Jacob on this problem. >>>>>> The command you've sent fixed the problem. >>>>>> Q1: Is it safe to remove the clhbsndr module from cluster public >>>>>> interfaces ? >>>>>> Q2: Is it a know bug? >>>>>> >>>>>> Thanks a lot, >>>>>> -- Leon >>>>>> >>>>>> On 11/1/07, Ashutosh Tripathi <Ashutosh.Tripathi at sun.com> wrote: >>>>>>> Hi Jacob, >>>>>>> >>>>>>> Additionally, >>>>>>> >>>>>>> Can you remove the clhbsndr module from the e1000g adapter: >>>>>>> >>>>>>> eg: ifconfig e1000g0 modremove clhbsndr at 2 >>>>>>> >>>>>>> and report back what you find? >>>>>>> >>>>>>> Thanks, >>>>>>> -ashu >>>>>>> >>>>>>> >>>>>>> LaoTsao(Dr. Tsao) wrote: >>>>>>>> hi >>>>>>>> May be this is releated to the IPMP that is required by Sun cluster >>>>>>>> When U run iperf -s server-IP did U use the Logicalhost IP address of >>>>>>>> the server? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Jacob wrote: >>>>>>>> >>>>>>>>> Hi List, >>>>>>>>> I've heard many good things about this list and the opensolaris >>>>>>>>> community,I sure hope someone here can help me out here. >>>>>>>>> >>>>>>>>> One of our systems is suffering for a very poor network throughput, >>>>>>>>> which appears to be affected by Sun Cluster. >>>>>>>>> >>>>>>>>> The system consists of 3 T2000 machines in Sun Cluster(3.2) running >>>>>>>>> on Solaris 10 u4. >>>>>>>>> The network throughput in non cluster mode is about 800mbit on a >>>>>>>>> single e1000g interface. >>>>>>>>> The Network [b]throughput falls by about 50%[/b] when booting the >>>>>>>>> machine(s) in cluster mode. >>>>>>>>> To isolate possible LAN problems, I've connected two machines using >>>>>>>>> cross cable - same result. >>>>>>>>> >>>>>>>>> The problem was reproduced by installing a brand new T2000 machine >>>>>>>>> with similar configuration as a single node cluster. >>>>>>>>> >>>>>>>>> All throughput measurements were done using iperf. >>>>>>>>> >>>>>>>>> Have anyone encountered something similar? >>>>>>>>> Does anyone have experience with T2000 machines in Sun Cluster with >>>>>>>>> regard to Network performance? >>>>>>>>> >>>>>>>>> Thanks in advance, >>>>>>>>> -- >>>>>>>>> >>>>>>>>> This message posted from opensolaris.org >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> ha-clusters-discuss mailing list >>>>>>>>> ha-clusters-discuss at opensolaris.org >>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss >>>>>>>>> >>>>>>>>> >>>>>>> _______________________________________________ >>>>>>> ha-clusters-discuss mailing list >>>>>>> ha-clusters-discuss at opensolaris.org >>>>>>> http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss >>>>>>> >>>> _______________________________________________ >>>> ha-clusters-discuss mailing list >>>> ha-clusters-discuss at opensolaris.org >>>> http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss >> _______________________________________________ >> ha-clusters-discuss mailing list >> ha-clusters-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss > > _______________________________________________ > ha-clusters-discuss mailing list > ha-clusters-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss