Hi Taylor,

Glad to hear you have a workaround.

I'm surprised that snapshotting while the service is running would cause a problem, from the Brooklyn perspective. When persisting to the filesystem, we are careful to ensure files are always in a valid state (e.g. write a tmp file and then do an atomic move to overwrite the previous file).

If you have more details of the problem you're seeing that would be useful. For example, what exception(s) are shown in the log on rebind? Can you share a copy of the persisted state for which rebind fails (but be careful sharing that - if you're not using "externalised configuration" [1] for credentials then the persisted state could contain your ssh key, cloud credentials, etc.)

Aled

[1] https://brooklyn.apache.org/v/latest/ops/externalized-configuration.html


On 31/07/2017 22:21, Taylor wrote:
Thanks for the input about the external object store Duncan and Robert.

I will be reviewing the options and testing them soon.

I was able to reproduce the state corruption several times. I am not sure what 
the issue is but the work around is to gracefully power down the vm hosting 
brooklyn and take a snapshot.

I dont think I will look any further into this. The work around is acceptable 
and ultimately I think moving to an object store is the best move.

Thanks,

Taylor

________________________________
From: Duncan Grant <[email protected]>
Sent: Monday, July 31, 2017 10:55 AM
To: Taylor; [email protected]
Subject: Re: Booklyn fails to start

Taylor,

I'd say that persistence is fairly robust in brooklyn as it's heavily used and 
well tested.

We use file-system based backup [1] in many cases and I haven't heard of anyone 
having the problem you describe.  Which makes me think it has something to do 
with using snapshots.  But that seems like it should be fine even in the case 
that brooklyn is writing to persistence when you take the snapshot (otherwise 
I'd expect a number of processes to be corrupted every time you took a 
snapshot). I'd be interested whether you see the same issue when you use the 
back approach described here [2].

Having said that, in most cases I recommend using an object store (e.g. s3) for 
persistence.  This should make persistence more reliable, allows you to make 
use of versioning, is necessary for running brooklyn in HA, etc.

For debugging issues with persistence I think that brooklyn itself is the best 
option.  Using the rebind.failureMode.rebind=fail_at_end [3] option in 
brooklyn.properties and then examining the log output usually makes it clear 
where something has gone wrong.  You can fairly easily edit the persistence 
files with a text editor as well as they are human readable and are basically 
entities, state, and relationships.

Regards

Duncan

[1] 
https://brooklyn.apache.org/v/latest/ops/persistence/index.html#persisted-state-backup
Persistence - Apache 
Brooklyn<https://brooklyn.apache.org/v/latest/ops/persistence/index.html#persisted-state-backup>
brooklyn.apache.org
Persistence. Brooklyn can be configured to persist its state so that the 
Brooklyn server can be restarted, or so that a high availability standby server 
can take over.


[2] 
https://brooklyn.apache.org/v/latest/ops/persistence/index.html#file-system-backup
Persistence - Apache 
Brooklyn<https://brooklyn.apache.org/v/latest/ops/persistence/index.html#file-system-backup>
brooklyn.apache.org
Persistence. Brooklyn can be configured to persist its state so that the 
Brooklyn server can be restarted, or so that a high availability standby server 
can take over.


[3] 
https://brooklyn.apache.org/v/latest/ops/persistence/index.html#ignore-errors
Persistence - Apache 
Brooklyn<https://brooklyn.apache.org/v/latest/ops/persistence/index.html#ignore-errors>
brooklyn.apache.org
Persistence. Brooklyn can be configured to persist its state so that the 
Brooklyn server can be restarted, or so that a high availability standby server 
can take over.



On Mon, 31 Jul 2017 at 15:04 Taylor 
<[email protected]<mailto:[email protected]>> wrote:


Duncan,

Thanks so much for the thorough response!

I will be reviewing the links you sent today.

With respect to the snapshot: I am running brooklyn on a CentOS VM hosted on 
XenServer. Since the original email I have been experimenting with snapshots to 
try and diagnose what the issue is. The only way I can take a snapshot and 
revert is if I stop the service and power off the vm before taking a snapshot 
(disk only, no memory). If I take the snapshot while the service is running or 
a after the service is stopped the persisted state will get corrupted.

This has me worried for the case of a production outage.

Are there any tools to aid in fixing the persisted state manually?

What mechanism is safe for backing up the persisted state? Can I backup while 
the service is running?

Thanks,

Taylor

________________________________
From: Duncan Grant 
<[email protected]<mailto:[email protected]>>
Sent: Monday, July 31, 2017 3:31 AM
To: [email protected]<mailto:[email protected]>
Cc: Taylor
Subject: Re: Booklyn fails to start

Taylor,

The error you're seeing is with Brooklyn failing to rebind to persisted state 
[1].  Could you explain what you mean when you are talking about taking a 
snapshot and then reverting to the snapshot (do you mean the VM image where you 
are running brooklyn?)

There are a couple of ways to deal with problems with persisted state.  You can 
either fix the persisted state manually[2] or you can have brooklyn ignore 
errors with persisted state when it starts [4].  Both of these run the risk of 
brooklyn becoming detached from existing applications so back up your 
persistance directory (or object store) first.

Let me know if this helps (or doesn't) or I'm on IRC just now if you'd like 
some answers in real-time.

Regards

Duncan

[1]https://brooklyn.apache.org/v/latest/ops/persistence/index.html#rebinding-to-state
Persistence - Apache 
Brooklyn<https://brooklyn.apache.org/v/latest/ops/persistence/index.html#rebinding-to-state>
brooklyn.apache.org<http://brooklyn.apache.org>
Persistence. Brooklyn can be configured to persist its state so that the 
Brooklyn server can be restarted, or so that a high availability standby server 
can take over.



[2]https://brooklyn.apache.org/v/latest/ops/persistence/index.html#determine-underlying-cause
Persistence - Apache 
Brooklyn<https://brooklyn.apache.org/v/latest/ops/persistence/index.html#determine-underlying-cause>
brooklyn.apache.org<http://brooklyn.apache.org>
Persistence. Brooklyn can be configured to persist its state so that the 
Brooklyn server can be restarted, or so that a high availability standby server 
can take over.



[3]https://brooklyn.apache.org/v/latest/ops/persistence/index.html#fix-up-the-state
Persistence - Apache 
Brooklyn<https://brooklyn.apache.org/v/latest/ops/persistence/index.html#fix-up-the-state>
brooklyn.apache.org<http://brooklyn.apache.org>
Persistence. Brooklyn can be configured to persist its state so that the 
Brooklyn server can be restarted, or so that a high availability standby server 
can take over.



[4]https://brooklyn.apache.org/v/latest/ops/persistence/index.html#ignore-errors
Persistence - Apache 
Brooklyn<https://brooklyn.apache.org/v/latest/ops/persistence/index.html#ignore-errors>
brooklyn.apache.org<http://brooklyn.apache.org>
Persistence. Brooklyn can be configured to persist its state so that the 
Brooklyn server can be restarted, or so that a high availability standby server 
can take over.





On Mon, 31 Jul 2017 at 07:57 Taylor 
<[email protected]<mailto:[email protected]>> wrote:
I am having a problem with brooklyn. If I start/stop the service things are ok. 
If I snapshot and revertto snapshot I see the following:


[root@localhost ~]# systemctl status brooklyn
brooklyn.service - Apache Brooklyn Service
    Loaded: loaded 
(/etc/systemd/system/multi-user.target.wants/brooklyn.service)
    Active: active (running) since Sun 2017-07-30 17:19:55 EDT; 44s ago
      Docs: https://brooklyn.apache.org/documentation/index.html
  Main PID: 651 (java)
    CGroup: /system.slice/brooklyn.service
            └─651 /usr/bin/java -Dbrooklyn.location.localhost.address=127.0.0.1 
-XX:SoftRefLRUPolicyMSPerMB=1 
-Dlogback.configurationFile=/etc/brooklyn/logback.xml -Xms256m -Xmx1g 
-XX:MaxP...

Jul 30 17:20:11 localhost.localdomain java[651]: 2017-07-30 17:20:11,553 INFO  
Started Brooklyn console at http://127.0.0.1:8081/, running 
classpath://brooklyn.war@/
Jul 30 17:20:13 localhost.localdomain java[651]: 2017-07-30 17:20:13,401 INFO  Geo info 
lookup for 127.0.0.1/127.0.0.1<http://127.0.0.1/127.0.0.1> returned: HostGeoInfo[RCN 
Corporation, Chicago (US): 127...4096374512<tel:(409)%20637-4512>)]
Jul 30 17:20:13 localhost.localdomain java[651]: 2017-07-30 17:20:13,736 ERROR 
Subsystem for persistence had startup error (continuing with startup): 
java.lang.IllegalStateExc...was scanning
Jul 30 17:20:13 localhost.localdomain java[651]: 
java.lang.IllegalStateException: Node record nodes/vmL5HEpG could not be read 
when upxGnvJq was scanning
Jul 30 17:20:13 localhost.localdomain java[651]: at 
org.apache.brooklyn.core.mgmt.ha.ManagementPlaneSyncRecordPersisterToObjectStore.loadSyncRecord(ManagementPlaneSyncRecordPe....jar:0.11.0]
Jul 30 17:20:13 localhost.localdomain java[651]: 2017-07-30 17:20:13,736 WARN  
Loading catalog for INITIALIZING as part of launch sequence (it was not loaded 
as part of the rebind sequence)
Jul 30 17:20:18 localhost.localdomain java[651]: 2017-07-30 17:20:18,851 INFO  
Launched Brooklyn; will now block until shutdown command received via GUI/API 
(recommended) or p...s interrupt.
Jul 30 17:20:28 localhost.localdomain java[651]: 2017-07-30 17:20:28,309 WARN  
Disallowing web request as server not in required HA hot state: 
http://192.168.1.14:8081/v1/catalog/applicat...
Jul 30 17:20:28 localhost.localdomain java[651]: 2017-07-30 17:20:28,309 WARN  
Disallowing web request as server not in required HA hot state: 
http://192.168.1.14:8081/v1/loca...s' to force)
Jul 30 17:20:28 localhost.localdomain java[651]: 2017-07-30 17:20:28,309 WARN  
Disallowing web request as server not in required HA hot state: 
http://192.168.1.14:8081/v1/catalog/entities...
Hint: Some lines were ellipsized, use -l to show in full.




Reply via email to