GFac App Catalog integration

Lahiru Gunathilake Mon, 29 Sep 2014 07:17:27 -0700

Hi All,

We have replaced XML descriptions we used to describe applications/hosts
 with new design(App Catalog)[1], now now GFac has to change to adapt new
App Catalog. Before jumping in to the integration I think its a good time
to review the GFac architecture and modify if necessary.

According to current architecture of GFAC we have plugins which can be
executed in a chain and this chained configuration is configured in an
XML[2] and based on the computing resource type(based on the old XML
descriptions)[3]. Basically we can configure an execution chain for GSISSH
type resource and another chain for an SSHType resource and another one for
some other type (like EC2). Currently we differentiate hosts based on their
authentication mechanism.This architecture leads to following limitations.

1. There could be scenarios for the same host type we need a different
execution pattern, according to this model we cannot have two execution
chains for the same host type.
2. There could be cases where we want to run particular handler only for
a given machine but in the same host type(Ex: for stampede run Handler1 at
the end but not for any other machine).
3. Differentiating hosts based on authentication doesn't looks right
because we have few machines authenticate in different mechanism but
everything else is same. But this problem has been solved in App Catalog
design and it has available authentication mechanism for a given compute
resource.
4. Currently execution chain is picked initially based on host and try
to execute but we do not have a fallback execution chain or any fault
tolerance in experiment level. Do we have to do a fault tolerance in this
level or just make the experiment failed and make Orchestrator to send
another job request to GFac with a different computing resource or
different authentication mechanism(if authentication failed) in a failure
scenario?Note: We only have fault tolerance implemented as if a particular
GFAC instance start not responding which is very rare unless we have a very
heavy load. In this case another gFAC instance can pick the execution chain
from the check pointed location and start executing the rest of the chain,
because execution chain picking logic is statically configured in an xml
and plugins itself can implement a recover method which can be invoked by
the Gfac core during a recovery process.
5. Currently there is a way to configure a chain based on the gateway
name but this is a simple configuration which means one gateway name can
have only one configuration and it will be the same for any host(We can
improve that to embed the configuration for each host in to an outer config
which provide the gateway name, so for a given gateway name we can have
given set of execution chains). I think GFac execution should be
customizable for each gateway without interfering other gateways.
6. Currently each plugin implementations are independent(EC2,GSISSH,SSH
etc) and they all have a dependency to gfac-core, if there are usecases
where mix of these plugins has to run we can configure these in a single
execution chain and as long as all the plugin artifacts are in the
classpath of GFAC things should work out of the box. But in such a scenario
how do we configure a chain(it will be hard to configure based on a
particular host).

My suggestion is to introduce more advanced XML configuration language with
defined precedence to pick a chains. Of course there will be n! ways to
order n number of plugins but in practice this will be very low number.
Since we have a precedence for selecting hosts(Ex: gateways name,
authentication type, host address,gateway user name or some other property
like when cpuCount > 10 in host stampede). We can come up with a proper
precedence order based on how specific the configuration is and we should
be able to group an execution between multiple gateways etc (We can discuss
about a nice way to configure the xml which covers most of the limitations
explained above).

To address the problem #4, if we decide GFAC has to act smart without
making an execution fail we can come up with some fault chain with a
precedence but since App Catalog is already have a way to define the
precedence, orchestrator can fall back and submit another job to Gfac so
GFac can act in a stateless way to handler a particular request(everything
from app-catalog is finalized when the request comes to Gfac, Gfac just
have to find the right set of handlers to use for this request).

And we can change the static nature of the configuration by storing this
configuration in to registry so when GFAc is up and running(Once we do a
proper Admin UI), this configuration can be modified by the admin. If a new
requirement comes for a particular gateway we can implement a plugin and
configure that plugin.

If you have any Ideas that would be great.

Regards
Lahiru

[1]
https://cwiki.apache.org/confluence/display/AIRAVATA/Airavata+Application+Catalog
[2]
https://github.com/apache/airavata/blob/master/modules/configuration/server/src/main/resources/gfac-config.xml
[3]
https://github.com/apache/airavata/tree/master/modules/commons/gfac-schema/src/main/resources/schemas

--
Research Assistant
Science Gateways Group
Indiana University

GFac App Catalog integration

Reply via email to