Fuseki lock-up - any ideas?

Dave Reynolds Tue, 16 Feb 2021 02:50:26 -0800

We have a mysterious problem with fuseki in production that we've notseen before. Posting in case anyone has seen something similar and hasany advice but I realise there's not really much here to go on.


Environment:
   Fuseki 3.17 (was 3.16, tried upgrade just in case) using TDB1
   OpenJDK java 8
   Docker container (running in k8s pod)
   ABW EBS file system
   O(2k) small updates per day (uses RDFConnection to send update)
   Variable read request rate but issue hits at low request levels


Symptoms are that fuseki receives an update request but never completes it:

    INFO  550175  POST http://localhost:3030/ds
    INFO  550175  Update
    INFO  550175  204 No Content (20 ms)
    INFO  550176  POST http://localhost:3030/ds
    INFO  550176  Update
-->
    INFO  550178  Query = ASK { ?s ?p ?o }

INFO 550178 GEThttp://localhost:3030/ds?query=ASK+%7B+%3Fs+%3Fp+%3Fo+%7DINFO 550179 GEThttp://localhost:3030/ds?query=ASK+%7B+%3Fs+%3Fp+%3Fo+%7D

    INFO  550179  Query = ASK { ?s ?p ?o }

So no 204 return from request 550176.

From that point on fuseki continues to log incoming read queries butdoes not answer any of them and the update request never terminates.Acts as if there's some form of deadlock.

Update requests are serialised, there's never more than one in flight ata time.

It's not the update itself that's the issue. It's small and if thecontainer is restarted with the same data and the same update sequenceis reapplied it all works fine.


The jvm stats all look completely fine in the prometheus records.

The various parts of this set up have been in various productionsettings without problems in the past. In particular, we've run theexact same pattern of mixed updates and queries in fuseki in a k8senvironment for two years without ever having a lockup. But on a newdeployment it's happening every few days.

There are differences between the new and old deployments but the oneswe've identified seem very unlikely to be the cause. We've not usedRDFConnection in the client before but can't see how that could affectthis. We don't often run with TDB on EBS but we do have a dozeninstances of that around which haven't had problems. We have generallyshifted to AWS Corretto as the jvm but we have plenty of OpenJDKinstances around without problems. The docker image is slightly unusualin using the s6 overlay init system rather than running fuseki as theroot process but again can't see how this might cause these symptoms andother uses of that, with fuseki, have been fine.

We'll find a workaround eventually, possibly involving shifting to TDB2,but posting in case anyone has had an experience similar enough to thisto give us some hints.


Dave

Fuseki lock-up - any ideas?

Reply via email to