Hi Leon, Jacob,

        Both of you asked for more details about clhbsndr.

Cluster membership is a very detailed and complex topic and
i would encourage you to participate in a Open HA Cluster user group
meeting happening near you (or Sun Tech Days where SC presents
too), to learn more in depth about it. You can meet face2face with
cluster engineers for detailed back and forth technical discussions.

        On the e-mail, i can only go so deep, but i would try
to answer your questions, at least at a high level. Please
see below.

Leon Koll wrote:
> Hi Ashu,
> 
> thank you for your efforts.
> Two questions:
> 1.the clhbsndr module is undocumented, that's why I am asking:
> why the cluster needs it on PUBLIC interfaces?
> My guess - it's not needed there but it's much easier to push it to
> all interfaces than to find the private ones and to push it to their
> stack.

        It is needed in some situations. Particularly in Solaris9 where
network interrupts coming in on the public network can interfere with
cluster heartbeats. clhbsndr modules helps in such situations.

> 
> 2.Another problem that we saw is : the nxge interface is not in
> /etc/iu.ap file. Looks like a resurrection of 5-years-old 4643340 bug.
> How the cluster works with private interconnect on nxge's without the
> clhbsndr module?

        On the private interconnects, a different mechanism is used as
the Cluster framework controls the plumbing and setup of the network
stack. For the public network, update to iu.ap file is needed, as
you have found out in 4643340.

        Always consult SC support matrix for questions about specific
hardware support.

Jacob wrote:
 > Do you have an estimate on official statement/release?

        I don't have an estimate right now. I hesitate to speculate
on where this would go. One of my colleagues alerted me to the fact that
there is an ongoing Escalation on this issue, so i would just say that
rest assured that SUN is looking at this as a high priority issue.

HTH,
-ashu



> 
> On 11/1/07, Ashutosh Tripathi <Ashutosh.Tripathi at sun.com> wrote:
>> Hi Leon,
>>
>> Thanks for getting back to us on this.
>>
>> We are still analyzing the issue and are not sure yet if the
>> problem is with the clhbsndr module, its interactions with
>> the Solaris STREAMS framework, or something else entirely.
>>
>> While we are working on an official statement on this. I would
>> suggest the following unofficial approach in the meantime.
>>
>>         Go ahead and run without the clhbsndr module on the
>> cluster public interface, but beware that in case you log a
>> support call on this cluster, particularly if it is related
>> to cluster membership and heartbeats, the cluster support
>> personnel may request you to reproduce the issue without
>> this interim fix.
>>
>> Hope that answers your questions,
>>
>> Best Regards,
>> -ashu
>>
>>
>> Leon Koll wrote:
>>> Hi Ashu,
>>>
>>> I am working with Jacob on this problem.
>>> The command you've sent fixed the problem.
>>> Q1: Is it safe to remove the clhbsndr module from cluster public interfaces 
>>> ?
>>> Q2: Is it a know bug?
>>>
>>> Thanks a lot,
>>> -- Leon
>>>
>>> On 11/1/07, Ashutosh Tripathi <Ashutosh.Tripathi at sun.com> wrote:
>>>> Hi Jacob,
>>>>
>>>> Additionally,
>>>>
>>>> Can you remove the clhbsndr module from the e1000g adapter:
>>>>
>>>> eg: ifconfig e1000g0 modremove clhbsndr at 2
>>>>
>>>> and report back what you find?
>>>>
>>>> Thanks,
>>>> -ashu
>>>>
>>>>
>>>> LaoTsao(Dr. Tsao) wrote:
>>>>> hi
>>>>> May be this is releated to the IPMP that is required by Sun cluster
>>>>> When U run iperf -s server-IP  did U use the Logicalhost IP address of
>>>>> the server?
>>>>>
>>>>>
>>>>>
>>>>> Jacob wrote:
>>>>>
>>>>>> Hi List,
>>>>>> I've heard many good things about this list and the opensolaris 
>>>>>> community,I sure hope someone here can help me out here.
>>>>>>
>>>>>> One of our systems is suffering for a very poor network throughput, 
>>>>>> which appears to be affected by Sun Cluster.
>>>>>>
>>>>>> The system consists of 3 T2000 machines in Sun Cluster(3.2) running on 
>>>>>> Solaris 10 u4.
>>>>>> The network throughput in non cluster mode is about 800mbit on a single 
>>>>>> e1000g interface.
>>>>>> The Network [b]throughput falls by about 50%[/b] when booting the 
>>>>>> machine(s) in cluster mode.
>>>>>> To isolate possible LAN problems, I've connected two machines using 
>>>>>> cross cable - same result.
>>>>>>
>>>>>> The problem was reproduced by installing a brand new T2000 machine with 
>>>>>> similar configuration as a single node cluster.
>>>>>>
>>>>>> All throughput measurements were done using iperf.
>>>>>>
>>>>>> Have anyone encountered something similar?
>>>>>> Does anyone have experience with T2000 machines in Sun Cluster with 
>>>>>> regard to Network performance?
>>>>>>
>>>>>> Thanks in advance,
>>>>>> --
>>>>>>
>>>>>> This message posted from opensolaris.org
>>>>>>
>>>>>> _______________________________________________
>>>>>> ha-clusters-discuss mailing list
>>>>>> ha-clusters-discuss at opensolaris.org
>>>>>> http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss
>>>>>>
>>>>>>
>>>> _______________________________________________
>>>> ha-clusters-discuss mailing list
>>>> ha-clusters-discuss at opensolaris.org
>>>> http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss
>>>>
> _______________________________________________
> ha-clusters-discuss mailing list
> ha-clusters-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss

Reply via email to