Abhilash,

Thanks for posting this proposal. I have a few questions and comments 
that I've embedded in the text below.

Regards,

Tim
---


On 12/21/08 10:51, Abhilash T.G wrote:
> Hi,
> 
>  Here is what we plan to do for CFF can you all give in more suggestions
> 
> Introduction
> IBM WebSphere Application Server provides developers and IT Architects
> with an innovative, performance based foundation to build, reuse, run,
> integrate and manage Service Oriented Architecture  applications and
> services. The Solaris Operating system is supported by the WebSphere
> server. It runs applications and services in a highly available,
> secure and scalable environment.              The purpose of this project is 
> to
> write an agent which can make the WebSphere server highly available by
> using Sun Cluster. The agent is preprogrammed to start and to shut
> down, fault monitor, and perform automatic failover for the WebSphere
> service.

Just for the sake of clarity, it might be more helpful to defer the 
'purpose' text to the end of the proposal once you've described all the 
background information. Having it here got me confused when I then read 
about the in-built clustering.

> HA capabilities of WebSphere server
> IBM WebSphere Application Server Network Deployment V6 offers a
> built-in application server clustering function and the HA Manager for
> protecting WebSphere singleton services like
> 
> Transaction service - Transaction log recovery
> Messaging service - Messaging engine restarting
> 
> The HA Manager runs as a service within each application server
> process that monitors the health of WebSphere clusters. In the event
> of a server failure, the HA Manager will failover the singleton
> service and recover any in-flight transactions.
> The Work Load Manager and the HA manager use the approach of non-IP
> based cluster failover.
> The inbuilt failover service of WebSphere is complementary to the use
> of Sun Cluster service which is IP-based. Together they can achieve
> high availability of WebSphere application server.

So is this project going to be in two parts?
1. An HA agent for WebSphere server for when you don't run WebSphere cluster
2. An HA agent for the HA-manager within the WebSphere cluster

Or are you only going to do one of these? You'll have to forgive my 
ignorance of WebSphere, I've only used Glassfish (albeit briefly).

>  USE OF SUN CLUSTER TO MAKE WEBSPHERE HIGHLY AVAILABLE
> 
> The Sun Cluster system achieves high availability through a
> combination of hardware and software. The redundant cluster
> interconnects storage and public networks to protect against single
> points of failure. The cluster software continuously monitors the
> health of member nodes and prevents failing nodes from participating
> in the cluster, protecting against data corruption. Also, the cluster
> monitors services and their dependent system resources, and fails over
> or restarts services in case of failures.
> Sun Cluster uses agents to simplify cluster configuration. Each type
> of resource supported in a cluster is associated with an agent. An
> agent is an installed program designed to control a particular
> resource type. The Resource Group manager handles high availability
> and scalability in Sun Cluster. It manages the resource types,
> resource groups and resources.
> Sun Cluster is an IP-based cluster failover service. This approach
> deals with a Virtual IP Address or IP Alias. The IP Alias is only used
> for one system at a time. In case of a failover, the IP Alias is moved
> to the other system. All applications or services have to use the
> Virtual IP Address to access the cluster.

Strictly speaking we also have the global IP (GIF node) function too 
where a scalable service, such as Sun Java Web Server can bind to the 
GIF and send packets back out locally. May be having WebSphere Cluster 
consuming load-balanced packets could be of value too?? It would avoid a 
load-balancer layer. This really would be added value (IMHO).

> Sun Cluster can thus
> 1. Reduce or eliminate system downtime because of software or hardware
> Failures.
> 2. Ensure availability of data and applications to users, regardless
> of the kind of
> failure that would normally take down a single-server system.
> 
> 3. Increase application throughput by enabling services to scale to additional
> processors by adding nodes to the cluster.
> 4. Provide enhanced availability of the system by enabling you to perform
> maintenance without shutting down the entire cluster.
> 
> Sun Cluster configurations tolerate the following types of
> single-point failures:
> Server operating environment failure because of a crash or a panic
> 
> 1. Data service failure
> 2. Server hardware failure
> 3. Network interface failure
> 4. Disk media failure
>       
> USE OF Generic Data Services (GDS) TEMPLATE FOR AGENT DEVELOPMENT
> All applications need start, usually stop, validation and probing.
> In-depth probing has to be done by the
> Agent. Agents can be developed within three models
> ? scdsbuilder using C
> ? scdsbuilder using ksh
> ? Generic Data Service (GDS)
> 
> We use the GDS based development as it possesses the following advantages.
> 
> ? No coding against cluster framework
> ? Less bugs
> ? Inherits automatic all future improvements
> ? Easy to debug
> ? Can be tested before registering
> ? Can be debugged on customer systems
> ? Deals with the application not with the framework
> ? Ready for the container sczsh component

I would recommend you use the GDS advanced toolkit/framework mechanism 
developed by my colleagues. This is the same one that is used to 
implement many of our GDS based services, e.g. the HA-container agent. 
Hopefully one of them will reply to this posting after the New Year.


-- 

Tim Read
Staff Engineer
Solaris Availability Engineering
Sun Microsystems Ltd
Springfield
Linlithgow
EH49 7LR

Phone: +44 (0)1506 672 684
Mobile: +44 (0)7802 212 137

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

NOTICE: This email message is for the sole use of the intended 
recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. 
If you are not the intended recipient, please contact the sender by 
reply email and destroy all copies of the original message.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  • [ha-clusters-di... Abhilash T.G
    • [ha-cluste... Tim Read - Staff Engineer Solaris Availability Engineering

Reply via email to