Taylor,

I'd say that persistence is fairly robust in brooklyn as it's heavily used
and well tested.

We use file-system based backup [1] in many cases and I haven't heard of
anyone having the problem you describe.  Which makes me think it has
something to do with using snapshots.  But that seems like it should be
fine even in the case that brooklyn is writing to persistence when you take
the snapshot (otherwise I'd expect a number of processes to be corrupted
every time you took a snapshot). I'd be interested whether you see the same
issue when you use the back approach described here [2].

Having said that, in most cases I recommend using an object store (e.g. s3)
for persistence.  This should make persistence more reliable, allows you to
make use of versioning, is necessary for running brooklyn in HA, etc.

For debugging issues with persistence I think that brooklyn itself is the
best option.  Using the rebind.failureMode.rebind=fail_at_end [3] option in
brooklyn.properties and then examining the log output usually makes it
clear where something has gone wrong.  You can fairly easily edit the
persistence files with a text editor as well as they are human readable and
are basically entities, state, and relationships.

Regards

Duncan

[1]
https://brooklyn.apache.org/v/latest/ops/persistence/index.html#persisted-state-backup
[2]
https://brooklyn.apache.org/v/latest/ops/persistence/index.html#file-system-backup
[3]
https://brooklyn.apache.org/v/latest/ops/persistence/index.html#ignore-errors

On Mon, 31 Jul 2017 at 15:04 Taylor <[email protected]> wrote:

>
> Duncan,
>
> Thanks so much for the thorough response!
>
> I will be reviewing the links you sent today.
>
> With respect to the snapshot: I am running brooklyn on a CentOS VM hosted
> on XenServer. Since the original email I have been experimenting with
> snapshots to try and diagnose what the issue is. The only way I can take a
> snapshot and revert is if I stop the service and power off the vm before
> taking a snapshot (disk only, no memory). If I take the snapshot while the
> service is running or a after the service is stopped the persisted state
> will get corrupted.
>
> This has me worried for the case of a production outage.
>
> Are there any tools to aid in fixing the persisted state manually?
>
> What mechanism is safe for backing up the persisted state? Can I backup
> while the service is running?
>
> Thanks,
>
> Taylor
>
> ------------------------------
> *From:* Duncan Grant <[email protected]>
> *Sent:* Monday, July 31, 2017 3:31 AM
> *To:* [email protected]
> *Cc:* Taylor
> *Subject:* Re: Booklyn fails to start
>
> Taylor,
>
> The error you're seeing is with Brooklyn failing to rebind to persisted
> state [1].  Could you explain what you mean when you are talking about
> taking a snapshot and then reverting to the snapshot (do you mean the VM
> image where you are running brooklyn?)
>
> There are a couple of ways to deal with problems with persisted state.
> You can either fix the persisted state manually[2] or you can have brooklyn
> ignore errors with persisted state when it starts [4].  Both of these run
> the risk of brooklyn becoming detached from existing applications so back
> up your persistance directory (or object store) first.
>
> Let me know if this helps (or doesn't) or I'm on IRC just now if you'd
> like some answers in real-time.
>
> Regards
>
> Duncan
>
> [1]
> https://brooklyn.apache.org/v/latest/ops/persistence/index.html#rebinding-to-state
> Persistence - Apache Brooklyn
> <https://brooklyn.apache.org/v/latest/ops/persistence/index.html#rebinding-to-state>
> brooklyn.apache.org
> Persistence. Brooklyn can be configured to persist its state so that the
> Brooklyn server can be restarted, or so that a high availability standby
> server can take over.
>
>
> [2]
> https://brooklyn.apache.org/v/latest/ops/persistence/index.html#determine-underlying-cause
> Persistence - Apache Brooklyn
> <https://brooklyn.apache.org/v/latest/ops/persistence/index.html#determine-underlying-cause>
> brooklyn.apache.org
> Persistence. Brooklyn can be configured to persist its state so that the
> Brooklyn server can be restarted, or so that a high availability standby
> server can take over.
>
>
> [3]
> https://brooklyn.apache.org/v/latest/ops/persistence/index.html#fix-up-the-state
> Persistence - Apache Brooklyn
> <https://brooklyn.apache.org/v/latest/ops/persistence/index.html#fix-up-the-state>
> brooklyn.apache.org
> Persistence. Brooklyn can be configured to persist its state so that the
> Brooklyn server can be restarted, or so that a high availability standby
> server can take over.
>
>
> [4]
> https://brooklyn.apache.org/v/latest/ops/persistence/index.html#ignore-errors
> Persistence - Apache Brooklyn
> <https://brooklyn.apache.org/v/latest/ops/persistence/index.html#ignore-errors>
> brooklyn.apache.org
> Persistence. Brooklyn can be configured to persist its state so that the
> Brooklyn server can be restarted, or so that a high availability standby
> server can take over.
>
>
>
>
> On Mon, 31 Jul 2017 at 07:57 Taylor <[email protected]> wrote:
>
>> I am having a problem with brooklyn. If I start/stop the service things
>> are ok. If I snapshot and revertto snapshot I see the following:
>>
>>
>> [root@localhost ~]# systemctl status brooklyn
>> brooklyn.service - Apache Brooklyn Service
>>    Loaded: loaded
>> (/etc/systemd/system/multi-user.target.wants/brooklyn.service)
>>    Active: active (running) since Sun 2017-07-30 17:19:55 EDT; 44s ago
>>      Docs: https://brooklyn.apache.org/documentation/index.html
>>  Main PID: 651 (java)
>>    CGroup: /system.slice/brooklyn.service
>>            └─651 /usr/bin/java
>> -Dbrooklyn.location.localhost.address=127.0.0.1
>> -XX:SoftRefLRUPolicyMSPerMB=1
>> -Dlogback.configurationFile=/etc/brooklyn/logback.xml -Xms256m -Xmx1g
>> -XX:MaxP...
>>
>> Jul 30 17:20:11 localhost.localdomain java[651]: 2017-07-30 17:20:11,553
>> INFO  Started Brooklyn console at http://127.0.0.1:8081/, running
>> classpath://brooklyn.war@/
>> Jul 30 17:20:13 localhost.localdomain java[651]: 2017-07-30 17:20:13,401
>> INFO  Geo info lookup for 127.0.0.1/127.0.0.1 returned: HostGeoInfo[RCN
>> Corporation, Chicago (US): 127...4096374512 <(409)%20637-4512>)]
>> Jul 30 17:20:13 localhost.localdomain java[651]: 2017-07-30 17:20:13,736
>> ERROR Subsystem for persistence had startup error (continuing with
>> startup): java.lang.IllegalStateExc...was scanning
>> Jul 30 17:20:13 localhost.localdomain java[651]:
>> java.lang.IllegalStateException: Node record nodes/vmL5HEpG could not be
>> read when upxGnvJq was scanning
>> Jul 30 17:20:13 localhost.localdomain java[651]: at
>> org.apache.brooklyn.core.mgmt.ha.ManagementPlaneSyncRecordPersisterToObjectStore.loadSyncRecord(ManagementPlaneSyncRecordPe....jar:0.11.0]
>> Jul 30 17:20:13 localhost.localdomain java[651]: 2017-07-30 17:20:13,736
>> WARN  Loading catalog for INITIALIZING as part of launch sequence (it was
>> not loaded as part of the rebind sequence)
>> Jul 30 17:20:18 localhost.localdomain java[651]: 2017-07-30 17:20:18,851
>> INFO  Launched Brooklyn; will now block until shutdown command received via
>> GUI/API (recommended) or p...s interrupt.
>> Jul 30 17:20:28 localhost.localdomain java[651]: 2017-07-30 17:20:28,309
>> WARN  Disallowing web request as server not in required HA hot state:
>> http://192.168.1.14:8081/v1/catalog/applicat...
>> Jul 30 17:20:28 localhost.localdomain java[651]: 2017-07-30 17:20:28,309
>> WARN  Disallowing web request as server not in required HA hot state:
>> http://192.168.1.14:8081/v1/loca...s' to force)
>> Jul 30 17:20:28 localhost.localdomain java[651]: 2017-07-30 17:20:28,309
>> WARN  Disallowing web request as server not in required HA hot state:
>> http://192.168.1.14:8081/v1/catalog/entities...
>> Hint: Some lines were ellipsized, use -l to show in full.
>>
>>
>>

Reply via email to