I'm still progressively testing linstor on a group of test servers, and now came across a new problem.

I had everything working nicely with 4 x nodes, linstor n l showed all 4 working, machines would automatically come back after a reboot, and it all looked good. I deleted all the storage-pools and re-created them (same name across all servers), and then followed the docs to create my first resource-definition, volume-definition and then used auto-placement on 3 nodes.

I then decided to get clever, and started creating another 9 tests with auto-placement on 4 nodes. All of this worked well, and I got a status like this:

╭──────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node   ┊ Resource   ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊  Allocated ┊ InUse  ┊    State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ castle ┊ testvm1    ┊ pool_hdd    ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san5   ┊ testvm1    ┊ pool_hdd    ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san6   ┊ testvm1    ┊ pool_hdd    ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san7   ┊ testvm1    ┊ pool_hdd    ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ castle ┊ testvm2    ┊ pool_hdd    ┊     0 ┊    1002 ┊ /dev/drbd1002 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san5   ┊ testvm2    ┊ pool_hdd    ┊     0 ┊    1002 ┊ /dev/drbd1002 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san6   ┊ testvm2    ┊ pool_hdd    ┊     0 ┊    1002 ┊ /dev/drbd1002 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san7   ┊ testvm2    ┊ pool_hdd    ┊     0 ┊    1002 ┊ /dev/drbd1002 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ castle ┊ testvm3    ┊ pool_hdd    ┊     0 ┊    1003 ┊ /dev/drbd1003 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san5   ┊ testvm3    ┊ pool_hdd    ┊     0 ┊    1003 ┊ /dev/drbd1003 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san6   ┊ testvm3    ┊ pool_hdd    ┊     0 ┊    1003 ┊ /dev/drbd1003 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san7   ┊ testvm3    ┊ pool_hdd    ┊     0 ┊    1003 ┊ /dev/drbd1003 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ castle ┊ testvm4    ┊ pool_hdd    ┊     0 ┊    1004 ┊ /dev/drbd1004 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san5   ┊ testvm4    ┊ pool_hdd    ┊     0 ┊    1004 ┊ /dev/drbd1004 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san6   ┊ testvm4    ┊ pool_hdd    ┊     0 ┊    1004 ┊ /dev/drbd1004 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san7   ┊ testvm4    ┊ pool_hdd    ┊     0 ┊    1004 ┊ /dev/drbd1004 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ castle ┊ testvm5    ┊ pool_hdd    ┊     0 ┊    1005 ┊ /dev/drbd1005 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san5   ┊ testvm5    ┊ pool_hdd    ┊     0 ┊    1005 ┊ /dev/drbd1005 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san6   ┊ testvm5    ┊ pool_hdd    ┊     0 ┊    1005 ┊ /dev/drbd1005 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san7   ┊ testvm5    ┊ pool_hdd    ┊     0 ┊    1005 ┊ /dev/drbd1005 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ castle ┊ testvm6    ┊ pool_hdd    ┊     0 ┊    1006 ┊ /dev/drbd1006 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san5   ┊ testvm6    ┊ pool_hdd    ┊     0 ┊    1006 ┊ /dev/drbd1006 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san6   ┊ testvm6    ┊ pool_hdd    ┊     0 ┊    1006 ┊ /dev/drbd1006 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san7   ┊ testvm6    ┊ pool_hdd    ┊     0 ┊    1006 ┊ /dev/drbd1006 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ castle ┊ testvm7    ┊ pool_hdd    ┊     0 ┊    1007 ┊ /dev/drbd1007 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san5   ┊ testvm7    ┊ pool_hdd    ┊     0 ┊    1007 ┊ /dev/drbd1007 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san6   ┊ testvm7    ┊ pool_hdd    ┊     0 ┊    1007 ┊ /dev/drbd1007 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san7   ┊ testvm7    ┊ pool_hdd    ┊     0 ┊    1007 ┊ /dev/drbd1007 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ castle ┊ testvm8    ┊ pool_hdd    ┊     0 ┊    1008 ┊ /dev/drbd1008 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san5   ┊ testvm8    ┊ pool_hdd    ┊     0 ┊    1008 ┊ /dev/drbd1008 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san6   ┊ testvm8    ┊ pool_hdd    ┊     0 ┊    1008 ┊ /dev/drbd1008 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san7   ┊ testvm8    ┊ pool_hdd    ┊     0 ┊    1008 ┊ /dev/drbd1008 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ castle ┊ testvm9    ┊ pool_hdd    ┊     0 ┊    1009 ┊ /dev/drbd1009 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san5   ┊ testvm9    ┊ pool_hdd    ┊     0 ┊    1009 ┊ /dev/drbd1009 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san6   ┊ testvm9    ┊ pool_hdd    ┊     0 ┊    1009 ┊ /dev/drbd1009 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ san7   ┊ testvm9    ┊ pool_hdd    ┊     0 ┊    1009 ┊ /dev/drbd1009 ┊ 102.42 MiB ┊ Unused ┊ UpToDate ┊ ┊ castle ┊ windows-wm ┊ pool_hdd    ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊  49.16 MiB ┊ Unused ┊ UpToDate ┊ ┊ san5   ┊ windows-wm ┊ pool_hdd    ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊  49.16 MiB ┊ Unused ┊ UpToDate ┊ ┊ san6   ┊ windows-wm ┊ pool_hdd    ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊  49.16 MiB ┊ Unused ┊ UpToDate ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯

You can see at the end the windows-wm is only on 3 nodes, while the other testvm are on 4. So I ran this command to add the windows-wm to the 4th node:

linstor resource create san7 windows-wm --storage-pool pool_hdd
SUCCESS:
    Successfully set property key(s): StorPoolName
SUCCESS:
Description:
    New resource 'windows-wm' on node 'san7' registered.
Details:
    Resource 'windows-wm' on node 'san7' UUID is: 50e34cac-6702-45c4-b242-b644415a7644
SUCCESS:
Description:
    Volume with number '0' on resource 'windows-wm' on node 'san7' successfully registered
Details:
    Volume UUID is: f6529463-03ac-4b31-af69-67b3153de355
SUCCESS:
    Added peer(s) 'san7' to resource 'windows-wm' on 'san6'
SUCCESS:
    Added peer(s) 'san7' to resource 'windows-wm' on 'castle'
SUCCESS:
    Added peer(s) 'san7' to resource 'windows-wm' on 'san5'
SUCCESS:
    Created resource 'windows-wm' on 'san7'
SUCCESS:
Description:
    Resource 'windows-wm' on 'san7' ready
Details:
    Node(s): 'san7', Resource: 'windows-wm'

This looked promising, and seemed to work, so I ran another "linstor volume list" but got an error:

linstor volume list
ERROR:
Show reports:
    linstor error-reports show 5F733CD9-00000-000004

The contents of that are below:
ERROR REPORT 5F733CD9-00000-000004

============================================================

Application:                        LINBIT® LINSTOR
Module:                             Controller
Version:                            1.9.0
Build ID: 678acd24a8b9b73a735407cd79ca33a5e95eb2e2
Build time:                         2020-09-23T10:27:49+00:00
Error time:                         2020-09-30 00:13:34
Node:                               castle.websitemanagers.com.au

============================================================

Reported error:
===============

Category:                           RuntimeException
Class name:                         NullPointerException
Class canonical name:               java.lang.NullPointerException
Generated at:                       Method 'deviceProviderKindAsString', Source file 'Json.java', Line #73


Call backtrace:

    Method                                   Native Class:Line number
    deviceProviderKindAsString               N com.linbit.linstor.api.rest.v1.serializer.Json:73     apiToVolume                              N com.linbit.linstor.api.rest.v1.serializer.Json:664     lambda$apiToResourceWithVolumes$2        N com.linbit.linstor.api.rest.v1.serializer.Json:477     accept                                   N java.util.stream.ReferencePipeline$3$1:195     forEachRemaining                         N java.util.ArrayList$ArrayListSpliterator:1655     copyInto                                 N java.util.stream.AbstractPipeline:484     wrapAndCopyInto                          N java.util.stream.AbstractPipeline:474     evaluateSequential                       N java.util.stream.ReduceOps$ReduceOp:913     evaluate                                 N java.util.stream.AbstractPipeline:234     collect                                  N java.util.stream.ReferencePipeline:578     apiToResourceWithVolumes                 N com.linbit.linstor.api.rest.v1.serializer.Json:506     lambda$listVolumesApiCallRcWithToResponse$1 N com.linbit.linstor.api.rest.v1.View:112     accept                                   N java.util.stream.ReferencePipeline$3$1:195     forEachRemaining                         N java.util.ArrayList$ArrayListSpliterator:1655     copyInto                                 N java.util.stream.AbstractPipeline:484     wrapAndCopyInto                          N java.util.stream.AbstractPipeline:474     evaluateSequential                       N java.util.stream.ReduceOps$ReduceOp:913     evaluate                                 N java.util.stream.AbstractPipeline:234     collect                                  N java.util.stream.ReferencePipeline:578     lambda$listVolumesApiCallRcWithToResponse$2 N com.linbit.linstor.api.rest.v1.View:113     onNext                                   N reactor.core.publisher.FluxFlatMap$FlatMapMain:378     onNext                                   N reactor.core.publisher.FluxContextStart$ContextStartSubscriber:96     onNext                                   N reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:242     onNext                                   N reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber:385     onNext                                   N reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:242     request                                  N reactor.core.publisher.Operators$ScalarSubscription:2317     onSubscribeInner                         N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:143     onSubscribe                              N reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:237     trySubscribeScalarMap                    N reactor.core.publisher.FluxFlatMap:191     subscribeOrReturn                        N reactor.core.publisher.MonoFlatMapMany:49     subscribe                                N reactor.core.publisher.Flux:8311     onNext                                   N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188     request                                  N reactor.core.publisher.Operators$ScalarSubscription:2317     onSubscribe                              N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134     subscribe                                N reactor.core.publisher.MonoCurrentContext:35     subscribe                                N reactor.core.publisher.Flux:8325     onNext                                   N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188     onNext                                   N reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber:121     complete                                 N reactor.core.publisher.Operators$MonoSubscriber:1755     onComplete                               N reactor.core.publisher.MonoStreamCollector$StreamCollectorSubscriber:167     onComplete                               N reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber:395     onComplete                               N reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:252     checkTerminated                          N reactor.core.publisher.FluxFlatMap$FlatMapMain:838     drainLoop                                N reactor.core.publisher.FluxFlatMap$FlatMapMain:600     innerComplete                            N reactor.core.publisher.FluxFlatMap$FlatMapMain:909     onComplete                               N reactor.core.publisher.FluxFlatMap$FlatMapInner:1013     onComplete                               N reactor.core.publisher.FluxMap$MapSubscriber:136     onComplete                               N reactor.core.publisher.Operators$MultiSubscriptionSubscriber:1989     onComplete                               N reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber:78     complete                                 N reactor.core.publisher.FluxCreate$BaseSink:438     drain                                    N reactor.core.publisher.FluxCreate$BufferAsyncSink:784     complete                                 N reactor.core.publisher.FluxCreate$BufferAsyncSink:732     drainLoop                                N reactor.core.publisher.FluxCreate$SerializedSink:239     drain                                    N reactor.core.publisher.FluxCreate$SerializedSink:205     complete                                 N reactor.core.publisher.FluxCreate$SerializedSink:196     apiCallComplete                          N com.linbit.linstor.netcom.TcpConnectorPeer:455     handleComplete                           N com.linbit.linstor.proto.CommonMessageProcessor:363     handleDataMessage                        N com.linbit.linstor.proto.CommonMessageProcessor:287     doProcessInOrderMessage                  N com.linbit.linstor.proto.CommonMessageProcessor:235     lambda$doProcessMessage$3                N com.linbit.linstor.proto.CommonMessageProcessor:220     subscribe                                N reactor.core.publisher.FluxDefer:46     subscribe                                N reactor.core.publisher.Flux:8325     onNext                                   N reactor.core.publisher.FluxFlatMap$FlatMapMain:418     drainAsync                               N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414     drain                                    N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679     onNext                                   N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243     drainFused                               N reactor.core.publisher.UnicastProcessor:286     drain                                    N reactor.core.publisher.UnicastProcessor:322     onNext                                   N reactor.core.publisher.UnicastProcessor:401     next                                     N reactor.core.publisher.FluxCreate$IgnoreSink:618     next                                     N reactor.core.publisher.FluxCreate$SerializedSink:153     processInOrder                           N com.linbit.linstor.netcom.TcpConnectorPeer:373     doProcessMessage                         N com.linbit.linstor.proto.CommonMessageProcessor:218     lambda$processMessage$2                  N com.linbit.linstor.proto.CommonMessageProcessor:164     onNext                                   N reactor.core.publisher.FluxPeek$PeekSubscriber:177     runAsync                                 N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439     run                                      N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526     call                                     N reactor.core.scheduler.WorkerTask:84     call                                     N reactor.core.scheduler.WorkerTask:37     run                                      N java.util.concurrent.FutureTask:264     run                                      N java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304     runWorker                                N java.util.concurrent.ThreadPoolExecutor:1128     run                                      N java.util.concurrent.ThreadPoolExecutor$Worker:628
    run                                      N java.lang.Thread:834


END OF ERROR REPORT.

Since it looks relevant, error reports 1, 2 and 3 are all similar for nodes castle, san5 and san6 (note san7 was the 4th/newest node where I tried to add the resource to it). Error report is:
ERROR REPORT 5F733CD9-00000-000001

============================================================

Application:                        LINBIT® LINSTOR
Module:                             Controller
Version:                            1.9.0
Build ID: 678acd24a8b9b73a735407cd79ca33a5e95eb2e2
Build time:                         2020-09-23T10:27:49+00:00
Error time:                         2020-09-30 00:13:01
Node:                               castle.websitemanagers.com.au
Peer:                               Node: 'san6'

============================================================

Reported error:
===============

Category:                           Error
Class name:                         ImplementationError
Class canonical name:               com.linbit.ImplementationError
Generated at:                       Method 'createStorageRscData', Source file 'CtrlRscLayerDataMerger.java', Line #214

Error message:                      Received unknown storage resource from satellite

Asynchronous stage backtrace:

    Error has been observed at the following site(s):
        |_ checkpoint ⇢ Execute single-stage API NotifyRscApplied
        |_ checkpoint ⇢ Fallback error handling wrapper
    Stack trace:

Call backtrace:

    Method                                   Native Class:Line number
    createStorageRscData                     N com.linbit.linstor.core.apicallhandler.CtrlRscLayerDataMerger:214

Suppressed exception 1 of 1:
===============
Category:                           RuntimeException
Class name:                         OnAssemblyException
Class canonical name: reactor.core.publisher.FluxOnAssembly.OnAssemblyException Generated at:                       Method 'createStorageRscData', Source file 'CtrlRscLayerDataMerger.java', Line #214

Error message:
Error has been observed at the following site(s):
    |_ checkpoint ⇢ Execute single-stage API NotifyRscApplied
    |_ checkpoint ⇢ Fallback error handling wrapper
Stack trace:

Call backtrace:

    Method                                   Native Class:Line number
    createStorageRscData                     N com.linbit.linstor.core.apicallhandler.CtrlRscLayerDataMerger:214     createStorageRscData                     N com.linbit.linstor.core.apicallhandler.CtrlRscLayerDataMerger:65     mergeStorageRscData                      N com.linbit.linstor.core.apicallhandler.AbsLayerRscDataMerger:285     merge                                    N com.linbit.linstor.core.apicallhandler.AbsLayerRscDataMerger:138     merge                                    N com.linbit.linstor.core.apicallhandler.AbsLayerRscDataMerger:148     mergeLayerData                           N com.linbit.linstor.core.apicallhandler.AbsLayerRscDataMerger:93     mergeLayerData                           N com.linbit.linstor.core.apicallhandler.CtrlRscLayerDataMerger:82     updateVolume                             N com.linbit.linstor.core.apicallhandler.controller.internal.RscInternalCallHandler:228     execute                                  N com.linbit.linstor.api.protobuf.internal.NotifyResourceApplied:45     executeNonReactive                       N com.linbit.linstor.proto.CommonMessageProcessor:525     lambda$execute$13                        N com.linbit.linstor.proto.CommonMessageProcessor:500     doInScope                                N com.linbit.linstor.core.apicallhandler.ScopeRunner:147     lambda$fluxInScope$0                     N com.linbit.linstor.core.apicallhandler.ScopeRunner:75     call                                     N reactor.core.publisher.MonoCallable:91     trySubscribeScalarMap                    N reactor.core.publisher.FluxFlatMap:126     subscribeOrReturn                        N reactor.core.publisher.MonoFlatMapMany:49     subscribe                                N reactor.core.publisher.Flux:8311     onNext                                   N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188     request                                  N reactor.core.publisher.Operators$ScalarSubscription:2317     onSubscribe                              N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134     subscribe                                N reactor.core.publisher.MonoCurrentContext:35     subscribe                                N reactor.core.publisher.InternalFluxOperator:62     subscribe                                N reactor.core.publisher.FluxDefer:54     subscribe                                N reactor.core.publisher.Flux:8325     onNext                                   N reactor.core.publisher.FluxFlatMap$FlatMapMain:418     drainAsync                               N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414     drain                                    N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679     onNext                                   N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243     drainFused                               N reactor.core.publisher.UnicastProcessor:286     drain                                    N reactor.core.publisher.UnicastProcessor:322     onNext                                   N reactor.core.publisher.UnicastProcessor:401     next                                     N reactor.core.publisher.FluxCreate$IgnoreSink:618     next                                     N reactor.core.publisher.FluxCreate$SerializedSink:153     processInOrder                           N com.linbit.linstor.netcom.TcpConnectorPeer:373     doProcessMessage                         N com.linbit.linstor.proto.CommonMessageProcessor:218     lambda$processMessage$2                  N com.linbit.linstor.proto.CommonMessageProcessor:164     onNext                                   N reactor.core.publisher.FluxPeek$PeekSubscriber:177     runAsync                                 N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439     run                                      N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526     call                                     N reactor.core.scheduler.WorkerTask:84     call                                     N reactor.core.scheduler.WorkerTask:37     run                                      N java.util.concurrent.FutureTask:264     run                                      N java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304     runWorker                                N java.util.concurrent.ThreadPoolExecutor:1128     run                                      N java.util.concurrent.ThreadPoolExecutor$Worker:628
    run                                      N java.lang.Thread:834


END OF ERROR REPORT.


So, questions:

1) Why did I end up in this state? I assume something was configured on castle/san5/san6 but not on san7.

2) How can I fix it?

Thanks,
Adam

_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
[email protected]
https://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to