Aled Sage created BROOKLYN-603:
----------------------------------

             Summary: SoftwareProcess.restart(restartMachine: true) fails: 
keypair does not exist
                 Key: BROOKLYN-603
                 URL: https://issues.apache.org/jira/browse/BROOKLYN-603
             Project: Brooklyn
          Issue Type: Bug
    Affects Versions: 1.0.0-M1
            Reporter: Aled Sage


For a {{VanillaSoftwareProcess}} entity, calling {{restart(restartMachine: 
true)}} fails with the error:
{noformat}
2018-09-28T01:13:35.382Z :  
{"timeMillis":1538097215363,"thread":"brooklyn-execmanager-bKgJW2xZ-51","level":"DEBUG","loggerName":"org.apache.brooklyn.util.core.task.BasicExecutionManager","message":"Exception
 running task Task[Cross-
context execution: Invoking effector restart on Docker Entity with parameters 
{restartMachine=true}]@O8Dz2VIj (rethrowing): 
org.apache.brooklyn.core.mgmt.internal.EffectorUtils$EffectorCallPropagatedRuntimeException:
 Error invoking r
estart at VanillaSoftwareProcessImpl{id=d0398d4n58}: Failed to get VM after 3 
attempts. - First cause is org.jclouds.rest.ResourceNotFoundException: The key 
pair 'jclouds#qa-docker-ent-d0398d4n58#4ef' does not exist (listed in primar
y trace); plus 2 more (e.g. the last is 
org.jclouds.rest.ResourceNotFoundException: The key pair 
'jclouds#qa-docker-ent-d0398d4n58#4ef' does not exist): 
ResourceNotFoundException: The key pair 'jclouds#qa-docker-ent-d0398d4n58#4ef' d
oes not 
exist","endOfBatch":false,"loggerFqcn":"org.ops4j.pax.logging.slf4j.Slf4jLogger","threadId":208,"threadPriority":5}
{noformat}
The series of events is:
 # start the entity:
 ## call jclouds to provision the VM
 ## jclouds creates the keyPair and other incidental resources, and provisions 
the VM
 # the keyPair is deleted (by our 'cloud cleaner')
 # restart the entity:
 ## call jclouds to stop the VM
 ### jclouds stops the VM, and also attempts to delete incidental resources 
(e.g. keyPairs)
 ## call jclouds to create the VM
 ### jclouds creates the keyPair
 ### jclouds decides to use the old keyPair, because that name is still in the 
{{credentialsMap}}.
 ### VM creation fails: keypair does not exist.

 

At step 3.1.1, {{org.jclouds.ec2.compute.EC2ComputeService.deleteKeyPair}}, it 
does not find the keypair. It therefore does not call 
{{credentialsMap.remove(new RegionAndName(region, keyPair.getKeyName()))}} 
(where credentialsMap is {{ConcurrentMap<RegionAndName, KeyPair> 
credentialsMap}}). It leaves the non-existent keypair name in the 
credentialsMap.

At step 3.2.2, it leaves behind the second keyPair, and tries to use the key 
name from the map. The code is at 
{{org.jclouds.ec2.compute.strategy.CreateKeyPairAndSecurityGroupsAsNeededAndReturnRunOptions.createOrImportKeyPair}}:
{noformat}
   // base EC2 driver currently does not support key import
   protected String createOrImportKeyPair(String region, String group, 
TemplateOptions options) {
      RegionAndName regionAndGroup = new RegionAndName(region, group);
      KeyPair keyPair = makeKeyPair.apply(new RegionAndName(region, group));
      // make sure that we don't request multiple keys simultaneously
      // if there is already a keypair for the group specified, use it
      // otherwise create a new keypair and key it under the group and also the 
regular keyname
      KeyPair origValue = credentialsMap.putIfAbsent(regionAndGroup, keyPair);
      if (origValue != null) {
         return origValue.getKeyName();
      }
{noformat}
There are a number of improvements we could make:

1. Fix jclouds, so it clears out the non-existant keypair name from the 
credentialsMap as part of {{cleanUpIncidentalResources}}.
 2. When the entity stops the VM and then creates a new VM, pass in a different 
group name (e.g. with an incremented suffix - currently the group name is 
generated from the app/entity's name + id).
 3. Discourage use of {{restart(restartMachine: true)}}  (e.g. the VM's IP 
address may well change - is the entity really implemented to support this? 
Would a user of this parameter just expect the VM to reboot, so is it 
dangerous?).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to