On 22/07/17 18:03, Andy Seaborne wrote:


On 22/07/17 12:10, Dave Reynolds wrote:
# Summary

We've started seeing an unusual JVM error message in some fuseki deployments:

java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code

I don't think Jena itself does any Java unsafe operations.

"The web" seems to think that mmap files, and running out of disk space can cause this. Or running out of shared mapped space.

Also:
http://users.jena.apache.narkive.com/n6Tddf3t/jena-fuseki-0-2-5-java-lang-internalerror-during-sparql-update


Jetty may well do, especially old architecture Jetty 8.


I don't think this is a Jena issue so not raising a Jira, but if anyone has seen this and has any workarounds or correlations that might help track it down I'd be grateful for any hints.

# Details

This is a largish service [1] with around 400MT, running Fuseki/TDB.
   fuseki1 1.3.0

Fuseki1 uses Jetty 8 which is an old architecture.

Fuseki2 uses Jetty 9.


   java openjdk 1.8.0_131
   ubuntu 16.04

It has been running stably for over a year on AWS EC2 servers. Several weeks ago we shifted to a newer EC2 instance type (i3.large) which has faster (nvme) disks. It has been solid [2] for the last few weeks answering a standard set of large queries daily. Currently running on two load balanced instances.

Then suddenly both servers have started failing with the above error message when attempting the same large queries that have been working up till now. The query is a relatively straightforward select with a sort but returns several GB of (streaming) results taking 3-5 minutes - so definitely memory and disk intensive.

This occurred on both instances at that the same time. The only correlation was that we did a system update on those servers the day before and this would have been the first time the big queries ran since that. So it seems quite likely that something in the system update has prompted this. However, I can't see any likely culprits - the update included libexpoxy, lxcfs, libc-bin, systemd, ureadahead, man-db, update-initramfs but no java update and we've not changed the fuseki version. The systems both rebooted cleanly after the update.

Searching for mentions of that error generally suggest JVM bugs, running out of memory or disk. The systems have plenty of spare disk and memory. In particular they are 16GB and typically run with 5GB used and nearly all the rest in buff/cache (leaving about 100Mb actually free, which is typical).

A reboot has cleared the errors on both servers and I have no way of reliably recreating the problem. Might not even come back :) However, worrying enough I thought I'd post it in case anyone else has seen/sees anything similar.

Dave

[1] http://environment.data.gov.uk/water-quality/view/landing

[2] There is an issue with nvme disks on Unbutu which leads to disk read errors but there's a known work around we've applied and we run quite a few of this server class at the moment without problems. In any case, none of the errors associated with the disk driver issues appear in the syslogs for these instances.

Reply via email to