I have overseen that many other commands handle this problem by using a waiting lock, so the related commands in this issue should do the same. I created Adding a new external network fails during auto-sync is running https://bugzilla.redhat.com/show_bug.cgi?id=1558054 to track this.
>From my point of view the other related two bugs I created are not required anymore, because other commands seems to use a waiting lock, too. Should we close 1557419 and 1557424? On Mon, 19 Mar 2018 09:40:32 +0100 Dominik Holler <[email protected]> wrote: > On Sat, 17 Mar 2018 16:16:49 +0200 > Dan Kenigsberg <[email protected]> wrote: > > > Thanks for filing those, but let us not keep them secret: > > > > Bug 1557419 - Recommend that a call should be tried again > > sounds most reasonable. Providing more information to the client > > makes sense > > > > Bug 1557424 - Automatically retry call failed because he failed to > > acquire a lock > > is more tricky. when would REST perform the retry? > > If the command returns a machine-readable hint that the conflict is > temporary. > > > how often? We must make sure we never cause a livelock. > > > > I agree that it is a good idea to have an upper limit for the count of > retries to prevent a livelock. > > > Can we alternatively take a normal blocking lock when we create an > > external network (instead of a trylock)? > > > > This might work, but currently no command is doing this. > (The lock is acquired in CommandBase.acquireLockInternal() which could > be overridden for all commands using an external network provider.) > But I would prefer a solution which solve the issue for other > commands, too. > > > On Fri, Mar 16, 2018 at 4:52 PM, Dominik Holler <[email protected]> > > wrote: > > > I have created > > > https://bugzilla.redhat.com/show_bug.cgi?id=1557419 and > > > https://bugzilla.redhat.com/show_bug.cgi?id=1557424 > > > to discuss how failing to acquire locks in engine should be > > > handled in engine's REST-API, in ovirt-sdk or in application (OST > > > in this case). > > > > > > > > > On Fri, 16 Mar 2018 09:32:47 +0000 > > > Dafna Ron <[email protected]> wrote: > > > > > >> Thank you for the fast reply and help. > > >> > > >> On Thu, Mar 15, 2018 at 8:21 PM, Dominik Holler > > >> <[email protected]> wrote: > > >> > > >> > On Thu, 15 Mar 2018 16:24:10 +0000 > > >> > Dafna Ron <[email protected]> wrote: > > >> > > > >> > > Hi, > > >> > > > > >> > > We have a failure on master for test > > >> > > 098_ovirt_provider_ovn.use_ovn_provider in project > > >> > > cockpit-ovirt. This seems to be a race because object is > > >> > > locked. also, the actual failure is logged as WARN and not > > >> > > ERROR. > > >> > > > > >> > > I don't think the patch is actually related to the failure > > >> > > but I think the test should be fixed. > > >> > > can you please review to make sure we do not have an actual > > >> > > regression and let me know if we need to open a bz to fix the > > >> > > test? > > >> > > > > >> > > > > >> > > *Link and headline of suspected patches: * > > >> > > *https://gerrit.ovirt.org/#/c/89020/2 > > >> > > <https://gerrit.ovirt.org/#/c/89020/2> - * > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > *wizard: Enable scroll on start page for low-res screensLink > > >> > > to > > >> > > Job:http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/6374 > > >> > > <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/6374>Link > > >> > > to all > > >> > > logs:http://jenkins.ovirt.org/job/ovirt-master_change-queue- > > >> > tester/6374/artifacts > > >> > > <http://jenkins.ovirt.org/job/ovirt-master_change-queue- > > >> > tester/6374/artifacts>(Relevant) > > >> > > error snippet from the log: <error>2018-03-15 10:05:00,160-04 > > >> > > DEBUG > > >> > > [org.ovirt.engine.core.sso.servlets.OAuthTokenInfoServlet] > > >> > > (default task-10) [] Sending json response2018-03-15 > > >> > > 10:05:00,160-04 DEBUG > > >> > > [org.ovirt.engine.core.sso.utils.TokenCleanupUtility] > > >> > > (default task-10) [] Not cleaning up expired tokens2018-03-15 > > >> > > 10:05:00,169-04 INFO > > >> > > [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand] > > >> > > (EE-ManagedThreadFactory-engineScheduled-Thread-90) > > >> > > [789edb23] Lock Acquired to object > > >> > > 'EngineLock:{exclusiveLocks='[c38a67ec-0b48-4e6f-be85- > > >> > 70c700df5483=PROVIDER]', > > >> > > sharedLocks=''}'2018-03-15 10:05:00,184-04 INFO > > >> > > [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand] > > >> > > (EE-ManagedThreadFactory-engineScheduled-Thread-90) > > >> > > [789edb23] Running command: SyncNetworkProviderCommand > > >> > > internal: true.2018-03-15 10:05:00,228-04 DEBUG > > >> > > [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$ > > >> > PostgresSimpleJdbcCall] > > >> > > (default task-13) [e1328379-17b7-49f8-beb2-cf8331784828] > > >> > > Compiled stored procedure. Call string is [{call > > >> > > getdcidbyexternalnetworkid(?)}]2018-03-15 10:05:00,228-04 > > >> > > DEBUG > > >> > > [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$ > > >> > PostgresSimpleJdbcCall] > > >> > > (default task-13) [e1328379-17b7-49f8-beb2-cf8331784828] > > >> > > SqlCall for procedure [GetDcIdByExternalNetworkId] > > >> > > compiled2018-03-15 10:05:00,229-04 DEBUG > > >> > > [org.ovirt.engine.core.common.di.interceptor.DebugLoggingInterceptor] > > >> > > (default task-13) [e1328379-17b7-49f8-beb2-cf8331784828] > > >> > > method: runQuery, params: [GetAllExternalNetworksOnProvider, > > >> > > IdQueryParameters:{refresh='false', filtered='false'}], > > >> > > timeElapsed: 353ms2018-03-15 10:05:00,239-04 INFO > > >> > > [org.ovirt.engine.core.bll.network.dc.AddNetworkCommand] > > >> > > (default task-13) [e1328379-17b7-49f8-beb2-cf8331784828] > > >> > > Failed to Acquire Lock to object > > >> > > 'EngineLock:{exclusiveLocks='[network_1=NETWORK, > > >> > > c38a67ec-0b48-4e6f-be85-70c700df5483=PROVIDER]', > > >> > > sharedLocks=''}'2018-03-15 10:05:00,239-04 WARN > > >> > > [org.ovirt.engine.core.bll.network.dc.AddNetworkCommand] > > >> > > (default task-13) [e1328379-17b7-49f8-beb2-cf8331784828] > > >> > > Validation of action 'AddNetwork' failed for user > > >> > > admin@internal-authz. Reasons: > > >> > > VAR__TYPE__NETWORK,VAR__ACTION__ADD,ACTION_TYPE_FAILED_PROVIDER_LOCKED,$ > > >> > > > > >> > providerId > > >> > > c38a67ec-0b48-4e6f-be85-70c700df54832018-03-15 > > >> > > 10:05:00,240-04 DEBUG > > >> > > [org.ovirt.engine.core.common.di.interceptor.DebugLoggingInterceptor] > > >> > > (default task-13) [e1328379-17b7-49f8-beb2-cf8331784828] > > >> > > method: runAction, params: [AddNetwork, > > >> > > AddNetworkStoragePoolParameters:{commandId='61b365ec-27c1- > > >> > 49af-ad72-f907df8befcd', > > >> > > user='null', commandType='Unknown'}], timeElapsed: > > >> > > 10ms2018-03-15 10:05:00,250-04 ERROR > > >> > > [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] > > >> > > (default task-13) [] Operation Failed: [Cannot add Network. > > >> > > Related operation on provider with the id > > >> > > c38a67ec-0b48-4e6f-be85-70c700df5483 is currently in > > >> > > progress. Please try again later.]2018-03-15 10:05:00,254-04 > > >> > > DEBUG [org.ovirt.engine.core.utils.servlet.LocaleFilter] > > >> > > (default task-14) [] Incoming locale 'en-US'. Filter > > >> > > determined locale to be 'en-US'2018-03-15 10:05:00,254-04 > > >> > > DEBUG [org.ovirt.engine.core.sso.servlets.OAuthTokenServlet] > > >> > > (default task-14) [] Entered OAuthTokenServlet Query String: > > >> > > null, Parameters : password = ***, grant_type = password, > > >> > > scope = ovirt-app-api ovirt-ext=token-info:validate, > > >> > > username = admin@internal, </error>* > > >> > > > >> > I will care about this. > > >> > The problem is that SyncNetworkProviderCommand is running in > > >> > the background and locking the provider, which blocks the lock > > >> > for the tested AddNetworkCommand. > > >> > The related changes are > > >> > core: Add locking for Add and RemoveNetworkCommand > > >> > https://gerrit.ovirt.org/#/c/85480/ > > >> > and > > >> > core: Add SyncNetworkProviderCommand > > >> > https://gerrit.ovirt.org/#/c/85134/ > > >> > > > >> > > > >> > > > > > > > _______________________________________________ > > > Devel mailing list > > > [email protected] > > > http://lists.ovirt.org/mailman/listinfo/devel > > > > > > > _______________________________________________ Devel mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/devel
