On 30/9/20 16:49, Gábor Hernádi wrote:
Hi,

I tried to recreate this issue, but without success.

4 Node setup, all LVM
First create a resource with --auto-place 3,
Create 9 other resources with --auto-place 4
Create the first resource on the 4th (missing) node
Check "linstor volume list"

That means, there has to be something else in your setup.
What else did you do? I see that your "first" resource "windows-wm" was more like the second resource, as it got the minor-number 1001, instead of 1000. That minor-number 1000 was later reused by "testvm1". However, was something broken with the "original" resource using minor-number 1000?

Unfortunately, yes, a whole bunch of things have been done on the first three nodes. I've been slowly messing around to try and get everything working over the last few months. There was another "testvm3" created before, which I deleted, and then started again with the further testing....


Error report 5F733CD9-00000-000004 is a NullPointerException, but this is most likely just a side-effect of the original issue.

> Since it looks relevant, error reports 1, 2 and 3 are all similar for nodes castle, san5 and san6

What about error report 0? Not relevant for this issue?

Ooops, I just didn't think that there was a 0 report... I've included it here now:

ERROR REPORT 5F733CD9-00000-000000

============================================================

Application:                        LINBIT® LINSTOR
Module:                             Controller
Version:                            1.9.0
Build ID: 678acd24a8b9b73a735407cd79ca33a5e95eb2e2
Build time:                         2020-09-23T10:27:49+00:00
Error time:                         2020-09-30 00:12:19
Node:                               castle.websitemanagers.com.au
Peer:                               RestClient(192.168.5.207; 'PythonLinstor/1.4.0 (API1.0.4)')

============================================================

Reported error:
===============

Description:
    Dependency not found

Category:                           LinStorException
Class name:                         LinStorException
Class canonical name: com.linbit.linstor.LinStorException
Generated at:                       Method 'checkStorPoolLoaded', Source file 'CtrlStorPoolResolveHelper.java', Line #225

Error message:                      Dependency not found

Error context:
    The storage pool 'DfltStorPool' for resource 'windows-wm' for volume number '0' is not deployed on node 'san7'.

Call backtrace:

    Method                                   Native Class:Line number
    checkStorPoolLoaded                      N com.linbit.linstor.CtrlStorPoolResolveHelper:225     resolveStorPool                          N com.linbit.linstor.CtrlStorPoolResolveHelper:149     resolveStorPool                          N com.linbit.linstor.CtrlStorPoolResolveHelper:65     createVolumeResolvingStorPool            N com.linbit.linstor.core.apicallhandler.controller.CtrlVlmCrtApiHelper:72     createResourceDb                         N com.linbit.linstor.core.apicallhandler.controller.CtrlRscCrtApiHelper:396     createResourceInTransaction              N com.linbit.linstor.core.apicallhandler.controller.CtrlRscCrtApiCallHandler:171     lambda$createResource$2                  N com.linbit.linstor.core.apicallhandler.controller.CtrlRscCrtApiCallHandler:143     doInScope                                N com.linbit.linstor.core.apicallhandler.ScopeRunner:147     lambda$fluxInScope$0                     N com.linbit.linstor.core.apicallhandler.ScopeRunner:75     call                                     N reactor.core.publisher.MonoCallable:91     trySubscribeScalarMap                    N reactor.core.publisher.FluxFlatMap:126     subscribeOrReturn                        N reactor.core.publisher.MonoFlatMapMany:49     subscribe                                N reactor.core.publisher.Flux:8311     onNext                                   N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188     request                                  N reactor.core.publisher.Operators$ScalarSubscription:2317     onSubscribe                              N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134     subscribe                                N reactor.core.publisher.MonoCurrentContext:35     subscribe                                N reactor.core.publisher.Flux:8325     onNext                                   N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188     request                                  N reactor.core.publisher.Operators$ScalarSubscription:2317     onSubscribe                              N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134     subscribe                                N reactor.core.publisher.MonoCurrentContext:35     subscribe                                N reactor.core.publisher.Flux:8325     onNext                                   N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188     onNext                                   N reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber:121     complete                                 N reactor.core.publisher.Operators$MonoSubscriber:1755     onComplete                               N reactor.core.publisher.MonoCollect$CollectSubscriber:152     onComplete                               N reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber:395     onComplete                               N reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:252     checkTerminated                          N reactor.core.publisher.FluxFlatMap$FlatMapMain:838     drainLoop                                N reactor.core.publisher.FluxFlatMap$FlatMapMain:600     drain                                    N reactor.core.publisher.FluxFlatMap$FlatMapMain:580     onComplete                               N reactor.core.publisher.FluxFlatMap$FlatMapMain:457     checkTerminated                          N reactor.core.publisher.FluxFlatMap$FlatMapMain:838     drainLoop                                N reactor.core.publisher.FluxFlatMap$FlatMapMain:600     innerComplete                            N reactor.core.publisher.FluxFlatMap$FlatMapMain:909     onComplete                               N reactor.core.publisher.FluxFlatMap$FlatMapInner:1013     onComplete                               N reactor.core.publisher.FluxMap$MapSubscriber:136     onComplete                               N reactor.core.publisher.Operators$MultiSubscriptionSubscriber:1989     onComplete                               N reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber:78     complete                                 N reactor.core.publisher.FluxCreate$BaseSink:438     drain                                    N reactor.core.publisher.FluxCreate$BufferAsyncSink:784     complete                                 N reactor.core.publisher.FluxCreate$BufferAsyncSink:732     drainLoop                                N reactor.core.publisher.FluxCreate$SerializedSink:239     drain                                    N reactor.core.publisher.FluxCreate$SerializedSink:205     complete                                 N reactor.core.publisher.FluxCreate$SerializedSink:196     apiCallComplete                          N com.linbit.linstor.netcom.TcpConnectorPeer:455     handleComplete                           N com.linbit.linstor.proto.CommonMessageProcessor:363     handleDataMessage                        N com.linbit.linstor.proto.CommonMessageProcessor:287     doProcessInOrderMessage                  N com.linbit.linstor.proto.CommonMessageProcessor:235     lambda$doProcessMessage$3                N com.linbit.linstor.proto.CommonMessageProcessor:220     subscribe                                N reactor.core.publisher.FluxDefer:46     subscribe                                N reactor.core.publisher.Flux:8325     onNext                                   N reactor.core.publisher.FluxFlatMap$FlatMapMain:418     drainAsync                               N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414     drain                                    N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679     onNext                                   N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243     drainFused                               N reactor.core.publisher.UnicastProcessor:286     drain                                    N reactor.core.publisher.UnicastProcessor:322     onNext                                   N reactor.core.publisher.UnicastProcessor:401     next                                     N reactor.core.publisher.FluxCreate$IgnoreSink:618     next                                     N reactor.core.publisher.FluxCreate$SerializedSink:153     processInOrder                           N com.linbit.linstor.netcom.TcpConnectorPeer:373     doProcessMessage                         N com.linbit.linstor.proto.CommonMessageProcessor:218     lambda$processMessage$2                  N com.linbit.linstor.proto.CommonMessageProcessor:164     onNext                                   N reactor.core.publisher.FluxPeek$PeekSubscriber:177     runAsync                                 N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439     run                                      N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526     call                                     N reactor.core.scheduler.WorkerTask:84     call                                     N reactor.core.scheduler.WorkerTask:37     run                                      N java.util.concurrent.FutureTask:264     run                                      N java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304     runWorker                                N java.util.concurrent.ThreadPoolExecutor:1128     run                                      N java.util.concurrent.ThreadPoolExecutor$Worker:628
    run                                      N java.lang.Thread:834


END OF ERROR REPORT.

> 1) Why did I end up in this state? I assume something was configured on castle/san5/san6 but not on san7.

Not sure... If something would be broken on san7, you should also have gotten an error report from a satellite. The ones you showed here are all created by the controller (error-ids XXX-00000-YYY are always controller-errors, satellite errors would also have some other "random-looking" number instead of the -00000- part)

> 2) How can I fix it?

If I cannot recreate it, there is not much I can do. You could of course try restarting the controller, that will reload the data from the database, which might fix things... I would be still curious what caused all of this...


Sure, will see if I can work it out. From the 0 error, it looks like I created some configuration in a name DfltStorPool, and probably this was not replicated to san7 (because san7 didn't exist back then). I'm not sure whether I would expect this to automatically be copied to the node if/when it is required, or if I should get an error saying the resource can't be deployed due to a missing dependency, but I suspect it shouldn't crash the way it is at the moment...

OK, so I did restart the controller, and now linstor volume list returns this:

╭──────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node   ┊ Resource   ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊  Allocated ┊ InUse  ┊    State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ castle ┊ testvm1    ┊ pool_hdd    ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san5   ┊ testvm1    ┊ pool_hdd    ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san6   ┊ testvm1    ┊ pool_hdd    ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san7   ┊ testvm1    ┊ pool_hdd    ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ castle ┊ testvm2    ┊ pool_hdd    ┊     0 ┊    1002 ┊ /dev/drbd1002 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san5   ┊ testvm2    ┊ pool_hdd    ┊     0 ┊    1002 ┊ /dev/drbd1002 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san6   ┊ testvm2    ┊ pool_hdd    ┊     0 ┊    1002 ┊ /dev/drbd1002 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san7   ┊ testvm2    ┊ pool_hdd    ┊     0 ┊    1002 ┊ /dev/drbd1002 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ castle ┊ testvm3    ┊ pool_hdd    ┊     0 ┊    1003 ┊ /dev/drbd1003 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san5   ┊ testvm3    ┊ pool_hdd    ┊     0 ┊    1003 ┊ /dev/drbd1003 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san6   ┊ testvm3    ┊ pool_hdd    ┊     0 ┊    1003 ┊ /dev/drbd1003 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san7   ┊ testvm3    ┊ pool_hdd    ┊     0 ┊    1003 ┊ /dev/drbd1003 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ castle ┊ testvm4    ┊ pool_hdd    ┊     0 ┊    1004 ┊ /dev/drbd1004 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san5   ┊ testvm4    ┊ pool_hdd    ┊     0 ┊    1004 ┊ /dev/drbd1004 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san6   ┊ testvm4    ┊ pool_hdd    ┊     0 ┊    1004 ┊ /dev/drbd1004 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san7   ┊ testvm4    ┊ pool_hdd    ┊     0 ┊    1004 ┊ /dev/drbd1004 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ castle ┊ testvm5    ┊ pool_hdd    ┊     0 ┊    1005 ┊ /dev/drbd1005 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san5   ┊ testvm5    ┊ pool_hdd    ┊     0 ┊    1005 ┊ /dev/drbd1005 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san6   ┊ testvm5    ┊ pool_hdd    ┊     0 ┊    1005 ┊ /dev/drbd1005 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san7   ┊ testvm5    ┊ pool_hdd    ┊     0 ┊    1005 ┊ /dev/drbd1005 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ castle ┊ testvm6    ┊ pool_hdd    ┊     0 ┊    1006 ┊ /dev/drbd1006 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san5   ┊ testvm6    ┊ pool_hdd    ┊     0 ┊    1006 ┊ /dev/drbd1006 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san6   ┊ testvm6    ┊ pool_hdd    ┊     0 ┊    1006 ┊ /dev/drbd1006 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san7   ┊ testvm6    ┊ pool_hdd    ┊     0 ┊    1006 ┊ /dev/drbd1006 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ castle ┊ testvm7    ┊ pool_hdd    ┊     0 ┊    1007 ┊ /dev/drbd1007 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san5   ┊ testvm7    ┊ pool_hdd    ┊     0 ┊    1007 ┊ /dev/drbd1007 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san6   ┊ testvm7    ┊ pool_hdd    ┊     0 ┊    1007 ┊ /dev/drbd1007 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san7   ┊ testvm7    ┊ pool_hdd    ┊     0 ┊    1007 ┊ /dev/drbd1007 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ castle ┊ testvm8    ┊ pool_hdd    ┊     0 ┊    1008 ┊ /dev/drbd1008 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san5   ┊ testvm8    ┊ pool_hdd    ┊     0 ┊    1008 ┊ /dev/drbd1008 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san6   ┊ testvm8    ┊ pool_hdd    ┊     0 ┊    1008 ┊ /dev/drbd1008 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san7   ┊ testvm8    ┊ pool_hdd    ┊     0 ┊    1008 ┊ /dev/drbd1008 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ castle ┊ testvm9    ┊ pool_hdd    ┊     0 ┊    1009 ┊ /dev/drbd1009 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san5   ┊ testvm9    ┊ pool_hdd    ┊     0 ┊    1009 ┊ /dev/drbd1009 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san6   ┊ testvm9    ┊ pool_hdd    ┊     0 ┊    1009 ┊ /dev/drbd1009 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san7   ┊ testvm9    ┊ pool_hdd    ┊     0 ┊    1009 ┊ /dev/drbd1009 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ castle ┊ windows-wm ┊ pool_hdd    ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊  49.16 MiB ┊ Unused ┊ UpToDate ┊ ┊ san5   ┊ windows-wm ┊ pool_hdd    ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊  49.16 MiB ┊ Unused ┊ UpToDate ┊ ┊ san6   ┊ windows-wm ┊ pool_hdd    ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊  49.16 MiB ┊ Unused ┊ UpToDate ┊ ┊ san7   ┊ windows-wm ┊ pool_hdd    ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊  49.16 MiB ┊ Unused ┊ UpToDate ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯

So, it would appear that it did actually deploy windows-wm to san7, and looks like it's all working again. Though I'm still rather unsure about my process, that with all my testing I might end up with an unstable system due to old config/testing bits left over.

To completely "reset" linstor, I've so far found the following configs:

/etc/linstor
/etc/drbd.d
/var/lib/linstor
/var/lib/linstor.d
/var/log/linstor

Plus I assume whatever storage spaces have been configured. Is there anything else that should be wiped to ensure I am starting with a clean slate? I'd rather not format the whole system and re-install ...

Thanks,
Adam


_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
[email protected]
https://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to