Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access

[email protected] Tue, 26 Aug 2014 12:42:12 -0700

I fixed the issue by setting the safe LZF encoder in LZFCompressor and
opened a pull request


https://github.com/elasticsearch/elasticsearch/pull/7466

Jörg


On Tue, Aug 26, 2014 at 8:17 PM, [email protected] <
[email protected]> wrote:

> Still broken with lzf-compress 1.0.3
>
> https://gist.github.com/jprante/d2d829b497db4963aea5
>
> Jörg
>
>
> On Tue, Aug 26, 2014 at 7:54 PM, [email protected] <
> [email protected]> wrote:
>
>> Thanks for the logstash mapping command. I can reproduce it now.
>>
>> It's the LZF encoder that bails out at
>> org.elasticsearch.common.compress.lzf.impl.UnsafeChunkEncoderBE._getInt
>>
>> which uses in turn sun.misc.Unsafe.getInt
>>
>> I have created a gist of the JVM crash file at
>>
>> https://gist.github.com/jprante/79f4b4c0b9fd83eb1c9b
>>
>> There has been a fix in LZF lately
>> https://github.com/ning/compress/commit/db7f51bddc5b7beb47da77eeeab56882c650bff7
>>
>> for version 1.0.3 which has been released recently.
>>
>> I will build a snapshot ES version with LZF 1.0.3 and see if this works...
>>
>> Jörg
>>
>>
>>
>> On Mon, Aug 25, 2014 at 11:30 PM, <[email protected]> wrote:
>>
>>> I captured a WireShark trace of the interaction between ES and Logstash
>>> 1.4.1.  The error occurs even before my data is sent.  Can you try to
>>> reproduce it on your testbed with this message I captured?
>>>
>>> curl -XPUT http://amssc103-mgmt-app2:9200/_template/logstash -d @y
>>>
>>> Contests of file 'y":
>>> {  "template" : "logstash-*",  "settings" : {
>>>  "index.refresh_interval" : "5s"  },  "mappings" : {    "_default_" : {
>>>   "_all" : {"enabled" : true},       "dynamic_templates" : [ {
>>> "string_fields" : {           "match" : "*",           "match_mapping_type"
>>> : "string",           "mapping" : {             "type" : "string", "index"
>>> : "analyzed", "omit_norms" : true,               "fields" : {
>>>   "raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" :
>>> 256}               }           }         }       } ],       "properties" :
>>> {         "@version": { "type": "string", "index": "not_analyzed" },
>>>   "geoip"  : {           "type" : "object",             "dynamic": true,
>>>           "path": "full",             "properties" : {
>>> "location" : { "type" : "geo_point" }             }         }       }    }
>>>  }}
>>>
>>>
>>>
>>> On Monday, August 25, 2014 3:53:18 PM UTC-4, [email protected] wrote:
>>>>
>>>> I have no plugins installed (yet) and only changed "es.logger.level" to
>>>> DEBUG in logging.yml.
>>>>
>>>> elasticsearch.yml:
>>>> cluster.name: es-AMS1Cluster
>>>> node.name: "KYLIE1"
>>>> node.rack: amssc2client02
>>>> path.data: /export/home/apontet/elasticsearch/data
>>>> path.work: /export/home/apontet/elasticsearch/work
>>>> path.logs: /export/home/apontet/elasticsearch/logs
>>>> network.host: ********       <===== sanitized line; file contains
>>>> actual server IP
>>>> discovery.zen.ping.multicast.enabled: false
>>>> discovery.zen.ping.unicast.hosts: ["s1", "s2", "s3", "s5" , "s6",
>>>> "s7"]   <===== Also sanitized
>>>>
>>>> Thanks,
>>>> Tony
>>>>
>>>>
>>>>
>>>>
>>>> On Saturday, August 23, 2014 6:29:40 AM UTC-4, Jörg Prante wrote:
>>>>>
>>>>> I tested a simple "Hello World" document on Elasticsearch 1.3.2 with
>>>>> Oracle JDK 1.7.0_17 64-bit Server VM, Sparc Solaris 10, default settings.
>>>>>
>>>>> No issues.
>>>>>
>>>>> So I would like to know more about the settings in elasticsearch.yml,
>>>>> the mappings, and the installed plugins.
>>>>>
>>>>> Jörg
>>>>>
>>>>>
>>>>> On Sat, Aug 23, 2014 at 11:25 AM, [email protected] <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> I have some Solaris 10 Sparc V440/V445 servers available and can try
>>>>>> to reproduce over the weekend.
>>>>>>
>>>>>> Jörg
>>>>>>
>>>>>>
>>>>>> On Sat, Aug 23, 2014 at 4:37 AM, Robert Muir <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> How big is it? Maybe i can have it anyway? I pulled two ancient
>>>>>>> ultrasparcs out of my closet to try to debug your issue, but 
>>>>>>> unfortunately
>>>>>>> they are a pita to work with (dead nvram battery on both, zeroed mac
>>>>>>> address, etc.) Id still love to get to the bottom of this.
>>>>>>>  On Aug 22, 2014 3:59 PM, <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Adrien,
>>>>>>>> It's a bunch of garbled binary data, basically a dump of the
>>>>>>>> process image.
>>>>>>>> Tony
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thursday, August 21, 2014 6:36:12 PM UTC-4, Adrien Grand wrote:
>>>>>>>>>
>>>>>>>>> Hi Tony,
>>>>>>>>>
>>>>>>>>> Do you have more information in the core dump file? (cf. the "Core
>>>>>>>>> dump written" line that you pasted)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Aug 21, 2014 at 7:53 PM, <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>> I installed ES 1.3.2 on a spare Solaris 11/ T4-4 SPARC server to
>>>>>>>>>> scale out of small x86 machine.  I get a similar exception running 
>>>>>>>>>> ES with
>>>>>>>>>> JAVA_OPTS=-d64.  When Logstash 1.4.1 sends the first message I get 
>>>>>>>>>> the
>>>>>>>>>> error below on the ES process:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> #
>>>>>>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>>>>>>> #
>>>>>>>>>> #  SIGBUS (0xa) at pc=0xffffffff7a9a3d8c, pid=14473, tid=209
>>>>>>>>>> #
>>>>>>>>>> # JRE version: 7.0_25-b15
>>>>>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed
>>>>>>>>>> mode solaris-sparc compressed oops)
>>>>>>>>>> # Problematic frame:
>>>>>>>>>> # V  [libjvm.so+0xba3d8c]  Unsafe_GetInt+0x158
>>>>>>>>>> #
>>>>>>>>>> # Core dump written. Default location: /export/home/elasticsearch/
>>>>>>>>>> elasticsearch-1.3.2/core or core.14473
>>>>>>>>>> #
>>>>>>>>>> # If you would like to submit a bug report, please visit:
>>>>>>>>>> #   http://bugreport.sun.com/bugreport/crash.jsp
>>>>>>>>>> #
>>>>>>>>>>
>>>>>>>>>> ---------------  T H R E A D  ---------------
>>>>>>>>>>
>>>>>>>>>> Current thread (0x0000000107078000):  JavaThread
>>>>>>>>>> "elasticsearch[KYLIE1][http_server_worker][T#17]{New I/O worker
>>>>>>>>>> #147}" daemon [_thread_in_vm, id=209, stack(0xffffffff5b800000,
>>>>>>>>>> 0xffffffff5b840000)]
>>>>>>>>>>
>>>>>>>>>> siginfo:si_signo=SIGBUS: si_errno=0, si_code=1 (BUS_ADRALN),
>>>>>>>>>> si_addr=0x0000000709cc09e7
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I can run ES using 32bit java but have to shrink ES_HEAPS_SIZE
>>>>>>>>>> more than I want to.  Any assistance would be appreciated.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Tony
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tuesday, July 22, 2014 5:43:28 AM UTC-4, David Roberts wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> After upgrading from Elasticsearch 1.0.1 to 1.2.2 I'm getting
>>>>>>>>>>> JVM core dumps on Solaris 10 on SPARC.
>>>>>>>>>>>
>>>>>>>>>>> # A fatal error has been detected by the Java Runtime
>>>>>>>>>>> Environment:
>>>>>>>>>>> #
>>>>>>>>>>> #  SIGBUS (0xa) at pc=0xffffffff7e452d78, pid=15483, tid=263
>>>>>>>>>>> #
>>>>>>>>>>> # JRE version: Java(TM) SE Runtime Environment (7.0_55-b13)
>>>>>>>>>>> (build 1.7.0_55-b13)
>>>>>>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.55-b03 mixed
>>>>>>>>>>> mode solaris-sparc compressed oops)
>>>>>>>>>>> # Problematic frame:
>>>>>>>>>>> # V  [libjvm.so+0xc52d78]  Unsafe_GetLong+0x158
>>>>>>>>>>>
>>>>>>>>>>> I'm pretty sure the problem here is that Elasticsearch is making
>>>>>>>>>>> increasing use of "unsafe" functions in Java, presumably to speed 
>>>>>>>>>>> things
>>>>>>>>>>> up, and some CPUs are more picky than others about memory 
>>>>>>>>>>> alignment.  In
>>>>>>>>>>> particular, x86 will tolerate misaligned memory access whereas 
>>>>>>>>>>> SPARC won't.
>>>>>>>>>>>
>>>>>>>>>>> Somebody has tried to report this to Oracle in the past and
>>>>>>>>>>> (understandably) Oracle has said that if you're going to use unsafe
>>>>>>>>>>> functions you need to understand what you're doing:
>>>>>>>>>>> http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8021574
>>>>>>>>>>>
>>>>>>>>>>> A quick grep through the code of the two versions of
>>>>>>>>>>> Elasticsearch shows that the new use of "unsafe" memory access 
>>>>>>>>>>> functions is
>>>>>>>>>>> in the BytesReference, MurmurHash3 and HyperLogLogPlusPlus classes:
>>>>>>>>>>>
>>>>>>>>>>> bash-3.2$ git checkout v1.0.1
>>>>>>>>>>> Checking out files: 100% (2904/2904), done.
>>>>>>>>>>>
>>>>>>>>>>> bash-3.2$ find . -name '*.java' | xargs grep UnsafeUtils
>>>>>>>>>>> ./src/main/java/org/elasticsearch/common/util/UnsafeUtils.java:public
>>>>>>>>>>> enum UnsafeUtils {
>>>>>>>>>>> ./src/main/java/org/elasticsearch/search/aggregations/bucket/
>>>>>>>>>>> BytesRefHash.java:            if (id == -1L ||
>>>>>>>>>>> UnsafeUtils.equals(key, get(id, spare))) {
>>>>>>>>>>> ./src/main/java/org/elasticsearch/search/aggregations/bucket/
>>>>>>>>>>> BytesRefHash.java:            } else if
>>>>>>>>>>> (UnsafeUtils.equals(key, get(curId, spare))) {
>>>>>>>>>>> ./src/test/java/org/elasticsearch/benchmark/common/util/Byte
>>>>>>>>>>> sRefComparisonsBenchmark.java:import
>>>>>>>>>>> org.elasticsearch.common.util.UnsafeUtils;
>>>>>>>>>>> ./src/test/java/org/elasticsearch/benchmark/common/util/Byte
>>>>>>>>>>> sRefComparisonsBenchmark.java:                return
>>>>>>>>>>> UnsafeUtils.equals(b1, b2);
>>>>>>>>>>>
>>>>>>>>>>> bash-3.2$ git checkout v1.2.2
>>>>>>>>>>> Checking out files: 100% (2220/2220), done.
>>>>>>>>>>>
>>>>>>>>>>> bash-3.2$ find . -name '*.java' | xargs grep UnsafeUtils
>>>>>>>>>>> ./src/main/java/org/elasticsearch/common/bytes/BytesReference.java:import
>>>>>>>>>>> org.elasticsearch.common.util.UnsafeUtils;
>>>>>>>>>>> ./src/main/java/org/elasticsearch/common/bytes/BytesReferenc
>>>>>>>>>>> e.java:                return UnsafeUtils.equals(a.array(),
>>>>>>>>>>> a.arrayOffset(), b.array(), b.arrayOffset(), a.length());
>>>>>>>>>>> ./src/main/java/org/elasticsearch/common/hash/MurmurHash3.java:import
>>>>>>>>>>> org.elasticsearch.common.util.UnsafeUtils;
>>>>>>>>>>> ./src/main/java/org/elasticsearch/common/hash/MurmurHash3.java:
>>>>>>>>>>> return UnsafeUtils.readLongLE(key, blockOffset);
>>>>>>>>>>> ./src/main/java/org/elasticsearch/common/hash/MurmurHash3.ja
>>>>>>>>>>> va:                long k1 = UnsafeUtils.readLongLE(key, i);
>>>>>>>>>>> ./src/main/java/org/elasticsearch/common/hash/MurmurHash3.ja
>>>>>>>>>>> va:                long k2 = UnsafeUtils.readLongLE(key, i + 8);
>>>>>>>>>>> ./src/main/java/org/elasticsearch/common/util/BytesRefHash.java:
>>>>>>>>>>> if (id == -1L || UnsafeUtils.equals(key, get(id, spare))) {
>>>>>>>>>>> ./src/main/java/org/elasticsearch/common/util/BytesRefHash.java:
>>>>>>>>>>> } else if (UnsafeUtils.equals(key, get(curId, spare))) {
>>>>>>>>>>> ./src/main/java/org/elasticsearch/common/util/UnsafeUtils.java:public
>>>>>>>>>>> enum UnsafeUtils {
>>>>>>>>>>> ./src/main/java/org/elasticsearch/search/aggregations/metrics/
>>>>>>>>>>> cardinality/HyperLogLogPlusPlus.java:import
>>>>>>>>>>> org.elasticsearch.common.util.UnsafeUtils;
>>>>>>>>>>> ./src/main/java/org/elasticsearch/search/aggregations/metrics/
>>>>>>>>>>> cardinality/HyperLogLogPlusPlus.java:            return
>>>>>>>>>>> UnsafeUtils.readIntLE(readSpare.bytes, readSpare.offset);
>>>>>>>>>>> ./src/test/java/org/elasticsearch/benchmark/common/util/Byte
>>>>>>>>>>> sRefComparisonsBenchmark.java:import
>>>>>>>>>>> org.elasticsearch.common.util.UnsafeUtils;
>>>>>>>>>>> ./src/test/java/org/elasticsearch/benchmark/common/util/Byte
>>>>>>>>>>> sRefComparisonsBenchmark.java:                return
>>>>>>>>>>> UnsafeUtils.equals(b1, b2);
>>>>>>>>>>>
>>>>>>>>>>> Presumably one of these three new uses is what is causing the
>>>>>>>>>>> JVM SIGBUS error I'm seeing.
>>>>>>>>>>>
>>>>>>>>>>> A quick look at the MurmurHash3 class shows that the hash128
>>>>>>>>>>> method accepts an arbitrary offset and passes it to an unsafe 
>>>>>>>>>>> function with
>>>>>>>>>>> no check that it's a multiple of 8:
>>>>>>>>>>>
>>>>>>>>>>>     public static Hash128 hash128(byte[] key, int offset, int
>>>>>>>>>>> length, long seed, Hash128 hash) {
>>>>>>>>>>>         long h1 = seed;
>>>>>>>>>>>         long h2 = seed;
>>>>>>>>>>>
>>>>>>>>>>>         if (length >= 16) {
>>>>>>>>>>>
>>>>>>>>>>>             final int len16 = length & 0xFFFFFFF0; // higher
>>>>>>>>>>> multiple of 16 that is lower than or equal to length
>>>>>>>>>>>             final int end = offset + len16;
>>>>>>>>>>>             for (int i = offset; i < end; i += 16) {
>>>>>>>>>>>                 long k1 = UnsafeUtils.readLongLE(key, i);
>>>>>>>>>>>                 long k2 = UnsafeUtils.readLongLE(key, i + 8);
>>>>>>>>>>>
>>>>>>>>>>> This is a recipe for generating JVM core dumps on architectures
>>>>>>>>>>> such as SPARC, Itanium and PowerPC that don't support unaligned 64 
>>>>>>>>>>> bit
>>>>>>>>>>> memory access.
>>>>>>>>>>>
>>>>>>>>>>> Does Elasticsearch have any policy for support of hardware other
>>>>>>>>>>> than x86?  If not, I don't think many people would care but you 
>>>>>>>>>>> really
>>>>>>>>>>> ought to clearly say so on your platform support page.  If you do 
>>>>>>>>>>> intend to
>>>>>>>>>>> support non-x86 architectures then you need to be much more careful 
>>>>>>>>>>> about
>>>>>>>>>>> the use of unsafe memory accesses.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>  --
>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>> Google Groups "elasticsearch" group.
>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>>> send an email to [email protected].
>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/eb7f4c23-b63
>>>>>>>>>> e-4c2e-87c3-029fc58449fc%40googlegroups.com
>>>>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/eb7f4c23-b63e-4c2e-87c3-029fc58449fc%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>> .
>>>>>>>>>>
>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Adrien Grand
>>>>>>>>>
>>>>>>>>  --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "elasticsearch" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to [email protected].
>>>>>>>> To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/12aa33de-
>>>>>>>> ccc7-485a-8c52-562f3e91a535%40googlegroups.com
>>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/12aa33de-ccc7-485a-8c52-562f3e91a535%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>  --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "elasticsearch" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/elasticsearch/
>>>>>>> CAMUKNZXOKeJq8Datx2KY7cSfJXDH1YGDNmQjNWDQ2jci%3DfN31Q%
>>>>>>> 40mail.gmail.com
>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAMUKNZXOKeJq8Datx2KY7cSfJXDH1YGDNmQjNWDQ2jci%3DfN31Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>
>>>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/c62191ea-543b-462d-95e9-aff125c0a6f0%40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/c62191ea-543b-462d-95e9-aff125c0a6f0%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHrOOqhgOSiRhmweSR5wLs%2BJiO70_CSRO%2BFS2zOU9VKzg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: JVM crash on 64 bit SPARC with Elasticsearch 1.2.2 due to unaligned memory access

Reply via email to