[jira] [Comment Edited] (FALCON-141) Support cluster updates

Balu Vellanki (JIRA) Thu, 03 Dec 2015 14:29:46 -0800

    [ 
https://issues.apache.org/jira/browse/FALCON-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038706#comment-15038706
 ]


Balu Vellanki edited comment on FALCON-141 at 12/3/15 10:28 PM:
----------------------------------------------------------------

[~sriksun] and [~venkatnrangan] : Here is the summary of our internal 
discussion. Based on the summary of all discussion and meetings, I will create 
subtasks for this jira. 

In most real-world scenarios, updating a cluster from non-HA to HA will require 
the following steps.
1. Shutdown falcon.
2. Update hadoop cluster from non-HA to HA (or non-secure to secure).
3. Update falcon configs like startup.properties
4. Start falcon. 

The initial solution was to update cluster entity after step-4 in a manner 
similar to updating a feed/process entity. This approach will not work because, 
when falcon is started, SharedLibraryHostingService needs to connect to hdfs of 
each cluster and copy relevant jars to <cluster_working_dir>/lib/.  Since the 
cluster entity has old values for read/write endpoints, Falcon will fail in 
this step and will not start.  The solution is to start falcon in a safe-mode, 
where falcon can start without having to access cluster-entity's hdfs location. 

Proposed Solution :
---------------------------
# Shutdown falcon.
# Update hadoop cluster from non-HA to HA (or non-secure to secure).
# Start Falcon in safe-mode. In this mode 
#* Falcon starts without starting SharedLibraryHostingService. 
#* SuperUsers of falcon (as specified in startup.properties) can update falcon 
cluster-entity
#* No other write-operations are allowed in safe-mode. 
# Update cluster-entity in Falcon
#* In this step, a SuperUser can update the existing cluster entity. The 
following fields can be updated by SuperUser
#** Cluster description, tags : Falcon will update the graphDB with new 
description and tags
#** All interfaces : The underlying read/write and workflow interfaces should 
be the same even if the url:port has changed
#** All locations : Falcon will validate new locations.
#** Properties
#* Get a lock on cluster entity in the ConfigStore.
#* If any interface, location or property is updated, the feeds and processes 
dependent on this cluster should be updated in the workflow engine. This should 
be done once Falcon is started in normal mode. So add a "requireUpdate" flag to 
each entity in the ConfigStore to specify that the feed/process entity should 
be updated in workflow engine. 
#* Commit the new cluster entity to ConfigStore after validating entity.
# Re-start Falcon in normal mode. This requires following tasks to be added to 
existing falcon startup. 
#* start SharedLibraryHostingService along with other services.
#* Update coordinator/bundle of all feed/process entities that are flagged 
during cluster entity update. The user of coord/bundle should be same as the 
prev owner of coord/bundle. 
 
Handling failures in cluster update :
-----------------------------------------------
1. If update cluster entity fails, throw an exception to SuperUser with the 
reason. The user will have to fix and restart cluster entity.
2. If update of coord/bundle of dependent entity fail,
    * Retain the requireUpdate flag on the entity.
    * Continue updating bundle/coord of remaining feed/process entities. 
    * At the end of Falcon start, show warning to user with list of entities 
that could not be updated. 
    * Provide ability to update bundle/coord of flagged entities individually. 
   



was (Author: bvellanki):
[~sriksun] and [~venkatnrangan] : Here is the summary of our internal 
discussion. Based on the summary of all discussion and meetings, I will create 
subtasks for this jira. 

In most real-world scenarios, updating a cluster from non-HA to HA will require 
the following steps.
1. Shutdown falcon.
2. Update hadoop cluster from non-HA to HA (or non-secure to secure).
3. Update falcon configs like startup.properties
4. Start falcon. 

The initial solution was to update cluster entity after step-4 in a manner 
similar to updating a feed/process entity. This approach will not work because,
- When falcon is started, SharedLibraryHostingService needs to connect to hdfs 
of each cluster and copy relevant jars to <cluster_working_dir>/lib/.  Since 
the cluster entity has old values for read/write endpoints, Falcon will fail in 
this step and will not start.  The solution is to start falcon in a safe-mode, 
where falcon can start without having to access cluster-entity's hdfs location. 

Proposed Solution :
---------------------------
1. Shutdown falcon.
2. Update hadoop cluster from non-HA to HA (or non-secure to secure).
3. Start Falcon in safe-mode. In this mode 
   - Falcon starts without starting SharedLibraryHostingService. 
   - SuperUsers of falcon (as specified in startup.properties) can update 
falcon cluster-entity
   - No other write-operations are allowed in safe-mode. 
4. Update cluster-entity in Falcon
   - In this step, a SuperUser can update the existing cluster entity. The 
following fields can be updated by SuperUser
        a) Cluster description, tags : Falcon will update the graphDB with new 
description and tags
        b) All interfaces : The underlying read/write and workflow interfaces 
should be the same even if the url:port has changed
        c) All locations : Falcon will validate new locations.
        d) Properties
   - Get a lock on cluster entity in the ConfigStore.
   - If any interface, location or property is updated, the feeds and processes 
dependent on this cluster should be updated in the workflow engine. This should 
be done once Falcon is started in normal mode. So add a "requireUpdate" flag to 
each entity in the ConfigStore to specify that the feed/process entity should 
be updated in workflow engine. 
   - Commit the new cluster entity to ConfigStore after validating entity.
5. Re-start Falcon in normal mode. This requires following tasks to be added to 
existing falcon startup. 
   - start SharedLibraryHostingService along with other services.
   - Update coordinator/bundle of all feed/process entities that are flagged 
during cluster entity update. The user of coord/bundle should be same as the 
prev owner of coord/bundle. 
 
Handling failures in cluster update :
-----------------------------------------------
1. If update cluster entity fails, throw an exception to SuperUser with the 
reason. The user will have to fix and restart cluster entity.
2. If update of coord/bundle of dependent entity fail,
    - Retain the requireUpdate flag on the entity.
    - Continue updating bundle/coord of remaining feed/process entities. 
    - At the end of Falcon start, show warning to user with list of entities 
that could not be updated. 
    - Provide ability to update bundle/coord of flagged entities individually. 
   


> Support cluster updates
> -----------------------
>
>                 Key: FALCON-141
>                 URL: https://issues.apache.org/jira/browse/FALCON-141
>             Project: Falcon
>          Issue Type: Bug
>            Reporter: Shwetha G S
>            Assignee: Ajay Yadava
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (FALCON-141) Support cluster updates

Reply via email to