Hi Andrea,

totally accurate analysis.

Also, meta-tile locks serve a very different purpose than config locks IIRC.
Meta-tile locks are most useful at seed-on-demand time, where you get
concurrent tile requests from the browser/client that fall into the same
meta-tile and there's a cache hit, so that all the tiles that comprise the
meta-tile are generated once.
May the not exist, you get a perf degradation, but not an inconsistent
server state. Which means that in a single instance deployment scenario an
in-memory locking strategy is all you need, and stripped locks work just
fine.
That shared locking mechanism might have been added (guessing) for the
clustered deployment scenario where the load balancer sends tile requests
that fall on the same metatile to different nodes. Nonetheless, some sort
of meta-tile key based distributed lock is all you need, and shall not be
shared with the global config lock AT ALL.




On Thu, 8 Apr 2021 at 05:57, Andrea Aime <[email protected]>
wrote:

> Hi all,
> I've been recently tasked to find out why one of our GeoServer was getting
> in a deadlock preventing
> further requests.
>
> By the looks of it, all threads were dead on the GWC metatile locks. The
> common setup for
> those is to use the MemoryLockProvider, a striped lock having 1024
> ReentrantLock instances,
> which are chosen for a given metatile by hashing its path
> <https://github.com/GeoWebCache/geowebcache/blob/main/geowebcache/core/src/main/java/org/geowebcache/locks/MemoryLockProvider.java#L68>
> .
> This class was then copied over to GeoServer, by Boundless, while working
> on the Resource
> subsystem. I did voice concerns about doing that, the locks was designed
> to work against
> meta tiles, not files, but guess I could not pinpoint exactly why that
> could cause issues.
>
> In normal conditions the locking works pretty well, and it has been around
> for years, so why was it causing
> trouble? Turns out a colleague tried out a configuration we never use,
> which makes GWC use
> the same locks set in the global configuration, e.g.:
>
> [image: image.png]
>
> While the "global locking" can be found in the global panel as "file
> locking":
>
> [image: image.png]
> At first I told myself, well, ok, sharing locks for different purposes is
> a bad idea for sure: deadlock prevention 101,
> all shared resources need to be acquired in a single shot, not one by one,
> so if a thread grabs a lock for tile generation,
> and then while running the internal GetMap, for whatever reason, due to
> whatever plugin, it grabs a second lock
> on a data directory Resource, we're in trouble, a deadlock can happen.
>
> At the same time I thought, but we don't use Resource.lock() all that
> much... I was wrong.
> Every time the input or output stream of a resource are used, a lock is
> acquired around their
> usage, transparently, by the FileSytemResourceStore.FileSystemResource, by
> the in() and out() methods.
> Ouch, that makes the deadlock so much more likely to happen.
>
> My first suggestion would be, don't allow admins to shoot themselves in
> the foot, and remove, from the
> caching defaults, the option to share the locks between GWC and Resource.
> This should be done ASAP in my opinion.
> Objections to this point?
>
> The second thing is, using a striped lock for file system access like
> FileSystemResourceStore ends up doing,
> is a really bad idea... if two threads manage to try and use 2 files that
> end up in the same MemoryLockProvider
> buckets by chance of hashing, and they grab them in opposite order of
> acquisition, they will deadlock each other
> (T1 grabs the lock on bucket A, T2 on bucket B, T1 goes to grab its second
> resource, which happens to also hit
> bucket B, T2 does the same with bucked A, and the deadlock is done).
>
> The chance can be lessened by not using striped locks, but creating unique
> reentrant locks for each string
> (probably backed by a weak hash map). Not entirely trivial to do, the
> reentrant lock creation needs to be well guarded.
> Two threads will still manage to deadlock each other, but they have to use
> the exact two same files, in opposite
> order of acquisition. Which is at least less likely to happen.
>
> But, I'm wondering if the Resource should automatically grab locks when
> using in or out at all,
> instead of allowing the caller to explicitly manage it, if and when
> needed.
> In the default GeoServer configuration, there is not even a need for them:
> the configuration global lock covers it.
> That lock prevents two admin requests to write at the same time (the
> global lock is more
> than just safety for IO, it's our poor man serializable transaction
> mechanism, it's important to prevent inconsistent
> changes to related catalog resources that often happen while doing admin
> tasks).
> Also checking what JDBCResourceStore does, it does not acquire locks
> around in() or out().
>
> Opinions?
>
> Cheers
> Andrea
>
> == GeoServer Professional Services from the experts! Visit
> http://goo.gl/it488V for more information. == Ing. Andrea Aime @geowolf
> Technical Lead GeoSolutions S.A.S. Via di Montramito 3/A 55054 Massarosa
> (LU) phone: +39 0584 962313 fax: +39 0584 1660272 mob: +39 339 8844549
> http://www.geo-solutions.it http://twitter.com/geosolutions_it
> ------------------------------------------------------- *Con riferimento
> alla normativa sul trattamento dei dati personali (Reg. UE 2016/679 -
> Regolamento generale sulla protezione dei dati “GDPR”), si precisa che ogni
> circostanza inerente alla presente email (il suo contenuto, gli eventuali
> allegati, etc.) è un dato la cui conoscenza è riservata al/i solo/i
> destinatario/i indicati dallo scrivente. Se il messaggio Le è giunto per
> errore, è tenuta/o a cancellarlo, ogni altra operazione è illecita. Le
> sarei comunque grato se potesse darmene notizia. This email is intended
> only for the person or entity to which it is addressed and may contain
> information that is privileged, confidential or otherwise protected from
> disclosure. We remind that - as provided by European Regulation 2016/679
> “GDPR” - copying, dissemination or use of this e-mail or the information
> herein by anyone other than the intended recipient is prohibited. If you
> have received this email by mistake, please notify us immediately by
> telephone or e-mail.*
> _______________________________________________
> Geoserver-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/geoserver-devel
>


-- 
Gabriel Roldán
_______________________________________________
Geoserver-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Reply via email to