On 22/07/17 12:10, Dave Reynolds wrote:
# Summary
We've started seeing an unusual JVM error message in some fuseki
deployments:
java.lang.InternalError: a fault occurred in a recent unsafe memory
access operation in compiled Java code
I don't think Jena itself does any Java unsafe operations.
Jetty may well do, especially old architecture Jetty 8.
I don't think this is a Jena issue so not raising a Jira, but if anyone
has seen this and has any workarounds or correlations that might help
track it down I'd be grateful for any hints.
# Details
This is a largish service [1] with around 400MT, running Fuseki/TDB.
fuseki1 1.3.0
Fuseki1 uses Jetty 8 which is an old architecture.
Fuseki2 uses Jetty 9.
java openjdk 1.8.0_131
ubuntu 16.04
It has been running stably for over a year on AWS EC2 servers. Several
weeks ago we shifted to a newer EC2 instance type (i3.large) which has
faster (nvme) disks. It has been solid [2] for the last few weeks
answering a standard set of large queries daily. Currently running on
two load balanced instances.
Then suddenly both servers have started failing with the above error
message when attempting the same large queries that have been working up
till now. The query is a relatively straightforward select with a sort
but returns several GB of (streaming) results taking 3-5 minutes - so
definitely memory and disk intensive.
This occurred on both instances at that the same time. The only
correlation was that we did a system update on those servers the day
before and this would have been the first time the big queries ran since
that. So it seems quite likely that something in the system update has
prompted this. However, I can't see any likely culprits - the update
included libexpoxy, lxcfs, libc-bin, systemd, ureadahead, man-db,
update-initramfs but no java update and we've not changed the fuseki
version. The systems both rebooted cleanly after the update.
Searching for mentions of that error generally suggest JVM bugs, running
out of memory or disk. The systems have plenty of spare disk and memory.
In particular they are 16GB and typically run with 5GB used and nearly
all the rest in buff/cache (leaving about 100Mb actually free, which is
typical).
A reboot has cleared the errors on both servers and I have no way of
reliably recreating the problem. Might not even come back :) However,
worrying enough I thought I'd post it in case anyone else has seen/sees
anything similar.
Dave
[1] http://environment.data.gov.uk/water-quality/view/landing
[2] There is an issue with nvme disks on Unbutu which leads to disk read
errors but there's a known work around we've applied and we run quite a
few of this server class at the moment without problems. In any case,
none of the errors associated with the disk driver issues appear in the
syslogs for these instances.