Aled Sage created BROOKLYN-560: ---------------------------------- Summary: MachineEntity rebind sometimes fails to add machine metrics feed Key: BROOKLYN-560 URL: https://issues.apache.org/jira/browse/BROOKLYN-560 Project: Brooklyn Issue Type: Bug Reporter: Aled Sage
{{MachineEntityJcloudsRebindTest.testRebind}} fails non-deterministically in 1.0.0-SNAPSHOT: {noformat} 2017-11-10 21:22:12,481 INFO TESTNG FAILED: "Surefire test" - org.apache.brooklyn.entity.machine.MachineEntityJcloudsRebindTest.testRebind() finished in 30882 ms java.lang.AssertionError: failed succeeds-eventually, 75 attempts, 30001ms elapsed: AssertionError: Commands (/etc/os-release) not contain in [ExecCmd{...},...] at org.apache.brooklyn.entity.machine.MachineEntityJcloudsRebindTest.assertRecordedSshCmdContainsEventually(MachineEntityJcloudsRebindTest.java:117) at org.apache.brooklyn.entity.machine.MachineEntityJcloudsRebindTest.testRebind(MachineEntityJcloudsRebindTest.java:103) Caused by: java.lang.AssertionError: at org.apache.brooklyn.entity.machine.MachineEntityJcloudsRebindTest.assertRecordedSshCmdContainsEventually(MachineEntityJcloudsRebindTest.java:117) at org.apache.brooklyn.entity.machine.MachineEntityJcloudsRebindTest.testRebind(MachineEntityJcloudsRebindTest.java:103) {noformat} To word it in terms of a bug in production code, when you rebind to a {{MachineEntity}} then it sometimes fails to add the {{machineMetricsFeed}} feed onto the entity again, which means it will not update sensors like {{machine.loadAverage}} or {{machine.cpu}}. The problem is that {{SoftwareProcessImpl.callRebindHooks()}} schedules a task to call {{connectSensors}} some random time within the next 10 seconds. However, if this executes very soon (before the current thread has got far enough with the rest of rebind), then when connectSensors calls {{JcloudsSshMachineLocation.inferMachineDetails}} it fails: it sees the {{JcloudsSshMachineLocation.isManaged()}} as still false. It therefore creates an empty BasicOsDetails, rather than executing the {{os-details.sh}} script. You can make this test fail consistently if you change the {{MachineEntity.MAXIMUM_REBIND_SENSOR_CONNECT_DELAY}} to {{0}} (rather than the {{100ms}} that is set in the test). The stacktrace of the job scheduled by {{SoftwareProcessImpl.callRebindHooks}} is shown below: {noformat} Daemon Thread [brooklyn-execmanager-Vv9IUPme-1] (Suspended) owns: Object (id=571) JcloudsSshMachineLocation.inferMachineDetails() line: 533 JcloudsSshMachineLocation(SshMachineLocation).getMachineDetails() line: 1037 JcloudsSshMachineLocation(SshMachineLocation).getOsDetails() line: 1018 MachineEntityImpl.connectSensors() line: 58 SoftwareProcessImpl$2.call() line: 402 SoftwareProcessImpl$2.call() line: 1 BasicExecutionManager$ScheduledTaskCallable$1.call() line: 476 BasicExecutionManager$SubmissionCallable<T>.call() line: 565 FutureTask<V>.run() line: 266 ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1142 ThreadPoolExecutor$Worker.run() line: 617 Thread.run() line: 745 {noformat} Note this relates to https://issues.apache.org/jira/browse/BROOKLYN-425, which reported similar symptoms of the feed to being registered, but where it was happening all the time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)