To add to Nick's comments: This fix also made it to latest
kernel patch 138889-03 (for x86) and 138888-03 (for sparc),
for S10.

These patches came out this month. I personally haven't tried
it out, but i think it would be worth giving it a try to
see if it helps.

If someone already has tried that patch (or does try it as
a result of this e-mail thread), appreciate if you can
share your experience.

Thanks,
-ashu


Nicholas Solter wrote:
> ???? ??? wrote:
>> Hi Ashu,
>> I see the bug report
>> http://bugs.opensolaris.org/view_bug.do?bug_id=6616644 that was
>> reported about the same problem.
>> Was it fixed and in which version of Solaris ?
> 
> Leon,
> 
> It looks like that bug was closed as a duplicate of 6529822, which was 
> fixed in build 88 of ON.
> 
> I don't know anything more than that, though.
> 
> Thanks,
> Nick
> 
>> Thanks,
>> -- Leon
>>
>> On Thu, Nov 1, 2007 at 11:57 PM, Ashutosh Tripathi
>> <Ashutosh.Tripathi at sun.com> wrote:
>>> Hi Leon, Jacob,
>>>
>>>        Both of you asked for more details about clhbsndr.
>>>
>>> Cluster membership is a very detailed and complex topic and
>>> i would encourage you to participate in a Open HA Cluster user group
>>> meeting happening near you (or Sun Tech Days where SC presents
>>> too), to learn more in depth about it. You can meet face2face with
>>> cluster engineers for detailed back and forth technical discussions.
>>>
>>>        On the e-mail, i can only go so deep, but i would try
>>> to answer your questions, at least at a high level. Please
>>> see below.
>>>
>>> Leon Koll wrote:
>>>> Hi Ashu,
>>>>
>>>> thank you for your efforts.
>>>> Two questions:
>>>> 1.the clhbsndr module is undocumented, that's why I am asking:
>>>> why the cluster needs it on PUBLIC interfaces?
>>>> My guess - it's not needed there but it's much easier to push it to
>>>> all interfaces than to find the private ones and to push it to their
>>>> stack.
>>>        It is needed in some situations. Particularly in Solaris9 where
>>> network interrupts coming in on the public network can interfere with
>>> cluster heartbeats. clhbsndr modules helps in such situations.
>>>
>>>> 2.Another problem that we saw is : the nxge interface is not in
>>>> /etc/iu.ap file. Looks like a resurrection of 5-years-old 4643340 bug.
>>>> How the cluster works with private interconnect on nxge's without the
>>>> clhbsndr module?
>>>        On the private interconnects, a different mechanism is used as
>>> the Cluster framework controls the plumbing and setup of the network
>>> stack. For the public network, update to iu.ap file is needed, as
>>> you have found out in 4643340.
>>>
>>>        Always consult SC support matrix for questions about specific
>>> hardware support.
>>>
>>> Jacob wrote:
>>>> Do you have an estimate on official statement/release?
>>>        I don't have an estimate right now. I hesitate to speculate
>>> on where this would go. One of my colleagues alerted me to the fact that
>>> there is an ongoing Escalation on this issue, so i would just say that
>>> rest assured that SUN is looking at this as a high priority issue.
>>>
>>> HTH,
>>> -ashu
>>>
>>>
>>>
>>>> On 11/1/07, Ashutosh Tripathi <Ashutosh.Tripathi at sun.com> wrote:
>>>>> Hi Leon,
>>>>>
>>>>> Thanks for getting back to us on this.
>>>>>
>>>>> We are still analyzing the issue and are not sure yet if the
>>>>> problem is with the clhbsndr module, its interactions with
>>>>> the Solaris STREAMS framework, or something else entirely.
>>>>>
>>>>> While we are working on an official statement on this. I would
>>>>> suggest the following unofficial approach in the meantime.
>>>>>
>>>>>        Go ahead and run without the clhbsndr module on the
>>>>> cluster public interface, but beware that in case you log a
>>>>> support call on this cluster, particularly if it is related
>>>>> to cluster membership and heartbeats, the cluster support
>>>>> personnel may request you to reproduce the issue without
>>>>> this interim fix.
>>>>>
>>>>> Hope that answers your questions,
>>>>>
>>>>> Best Regards,
>>>>> -ashu
>>>>>
>>>>>
>>>>> Leon Koll wrote:
>>>>>> Hi Ashu,
>>>>>>
>>>>>> I am working with Jacob on this problem.
>>>>>> The command you've sent fixed the problem.
>>>>>> Q1: Is it safe to remove the clhbsndr module from cluster public
>>>>>> interfaces ?
>>>>>> Q2: Is it a know bug?
>>>>>>
>>>>>> Thanks a lot,
>>>>>> -- Leon
>>>>>>
>>>>>> On 11/1/07, Ashutosh Tripathi <Ashutosh.Tripathi at sun.com> wrote:
>>>>>>> Hi Jacob,
>>>>>>>
>>>>>>> Additionally,
>>>>>>>
>>>>>>> Can you remove the clhbsndr module from the e1000g adapter:
>>>>>>>
>>>>>>> eg: ifconfig e1000g0 modremove clhbsndr at 2
>>>>>>>
>>>>>>> and report back what you find?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> -ashu
>>>>>>>
>>>>>>>
>>>>>>> LaoTsao(Dr. Tsao) wrote:
>>>>>>>> hi
>>>>>>>> May be this is releated to the IPMP that is required by Sun cluster
>>>>>>>> When U run iperf -s server-IP  did U use the Logicalhost IP address of
>>>>>>>> the server?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Jacob wrote:
>>>>>>>>
>>>>>>>>> Hi List,
>>>>>>>>> I've heard many good things about this list and the opensolaris
>>>>>>>>> community,I sure hope someone here can help me out here.
>>>>>>>>>
>>>>>>>>> One of our systems is suffering for a very poor network throughput,
>>>>>>>>> which appears to be affected by Sun Cluster.
>>>>>>>>>
>>>>>>>>> The system consists of 3 T2000 machines in Sun Cluster(3.2) running
>>>>>>>>> on Solaris 10 u4.
>>>>>>>>> The network throughput in non cluster mode is about 800mbit on a
>>>>>>>>> single e1000g interface.
>>>>>>>>> The Network [b]throughput falls by about 50%[/b] when booting the
>>>>>>>>> machine(s) in cluster mode.
>>>>>>>>> To isolate possible LAN problems, I've connected two machines using
>>>>>>>>> cross cable - same result.
>>>>>>>>>
>>>>>>>>> The problem was reproduced by installing a brand new T2000 machine
>>>>>>>>> with similar configuration as a single node cluster.
>>>>>>>>>
>>>>>>>>> All throughput measurements were done using iperf.
>>>>>>>>>
>>>>>>>>> Have anyone encountered something similar?
>>>>>>>>> Does anyone have experience with T2000 machines in Sun Cluster with
>>>>>>>>> regard to Network performance?
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> This message posted from opensolaris.org
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> ha-clusters-discuss mailing list
>>>>>>>>> ha-clusters-discuss at opensolaris.org
>>>>>>>>> http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss
>>>>>>>>>
>>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ha-clusters-discuss mailing list
>>>>>>> ha-clusters-discuss at opensolaris.org
>>>>>>> http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss
>>>>>>>
>>>> _______________________________________________
>>>> ha-clusters-discuss mailing list
>>>> ha-clusters-discuss at opensolaris.org
>>>> http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss
>> _______________________________________________
>> ha-clusters-discuss mailing list
>> ha-clusters-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss
> 
> _______________________________________________
> ha-clusters-discuss mailing list
> ha-clusters-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss


Reply via email to