All praise should go to the fantastic Elasticsearch team who did not hesitate to test the fix immediately and replaced it with a better working solution, since the lzf-compress software is having weaknesses regarding threadsafety.
Jörg On Wed, Aug 27, 2014 at 7:01 PM, Ivan Brusic <[email protected]> wrote: > Amazing job. Great work. > > -- > Ivan > > > On Tue, Aug 26, 2014 at 12:41 PM, [email protected] < > [email protected]> wrote: > >> I fixed the issue by setting the safe LZF encoder in LZFCompressor and >> opened a pull request >> >> https://github.com/elasticsearch/elasticsearch/pull/7466 >> >> Jörg >> >> >> On Tue, Aug 26, 2014 at 8:17 PM, [email protected] < >> [email protected]> wrote: >> >>> Still broken with lzf-compress 1.0.3 >>> >>> https://gist.github.com/jprante/d2d829b497db4963aea5 >>> >>> Jörg >>> >>> >>> On Tue, Aug 26, 2014 at 7:54 PM, [email protected] < >>> [email protected]> wrote: >>> >>>> Thanks for the logstash mapping command. I can reproduce it now. >>>> >>>> It's the LZF encoder that bails out at >>>> org.elasticsearch.common.compress.lzf.impl.UnsafeChunkEncoderBE._getInt >>>> >>>> which uses in turn sun.misc.Unsafe.getInt >>>> >>>> I have created a gist of the JVM crash file at >>>> >>>> https://gist.github.com/jprante/79f4b4c0b9fd83eb1c9b >>>> >>>> There has been a fix in LZF lately >>>> https://github.com/ning/compress/commit/db7f51bddc5b7beb47da77eeeab56882c650bff7 >>>> >>>> for version 1.0.3 which has been released recently. >>>> >>>> I will build a snapshot ES version with LZF 1.0.3 and see if this >>>> works... >>>> >>>> Jörg >>>> >>>> >>>> >>>> On Mon, Aug 25, 2014 at 11:30 PM, <[email protected]> wrote: >>>> >>>>> I captured a WireShark trace of the interaction between ES and >>>>> Logstash 1.4.1. The error occurs even before my data is sent. Can you >>>>> try >>>>> to reproduce it on your testbed with this message I captured? >>>>> >>>>> curl -XPUT http://amssc103-mgmt-app2:9200/_template/logstash -d @y >>>>> >>>>> Contests of file 'y": >>>>> { "template" : "logstash-*", "settings" : { >>>>> "index.refresh_interval" : "5s" }, "mappings" : { "_default_" : { >>>>> "_all" : {"enabled" : true}, "dynamic_templates" : [ { >>>>> "string_fields" : { "match" : "*", >>>>> "match_mapping_type" >>>>> : "string", "mapping" : { "type" : "string", "index" >>>>> : "analyzed", "omit_norms" : true, "fields" : { >>>>> "raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : >>>>> 256} } } } } ], "properties" : >>>>> { "@version": { "type": "string", "index": "not_analyzed" }, >>>>> "geoip" : { "type" : "object", "dynamic": true, >>>>> "path": "full", "properties" : { >>>>> "location" : { "type" : "geo_point" } } } } } >>>>> }} >>>>> >>>>> >>>>> >>>>> On Monday, August 25, 2014 3:53:18 PM UTC-4, [email protected] wrote: >>>>>> >>>>>> I have no plugins installed (yet) and only changed "es.logger.level" >>>>>> to DEBUG in logging.yml. >>>>>> >>>>>> elasticsearch.yml: >>>>>> cluster.name: es-AMS1Cluster >>>>>> node.name: "KYLIE1" >>>>>> node.rack: amssc2client02 >>>>>> path.data: /export/home/apontet/elasticsearch/data >>>>>> path.work: /export/home/apontet/elasticsearch/work >>>>>> path.logs: /export/home/apontet/elasticsearch/logs >>>>>> network.host: ******** <===== sanitized line; file contains >>>>>> actual server IP >>>>>> discovery.zen.ping.multicast.enabled: false >>>>>> discovery.zen.ping.unicast.hosts: ["s1", "s2", "s3", "s5" , "s6", >>>>>> "s7"] <===== Also sanitized >>>>>> >>>>>> Thanks, >>>>>> Tony >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Saturday, August 23, 2014 6:29:40 AM UTC-4, Jörg Prante wrote: >>>>>>> >>>>>>> I tested a simple "Hello World" document on Elasticsearch 1.3.2 with >>>>>>> Oracle JDK 1.7.0_17 64-bit Server VM, Sparc Solaris 10, default >>>>>>> settings. >>>>>>> >>>>>>> No issues. >>>>>>> >>>>>>> So I would like to know more about the settings in >>>>>>> elasticsearch.yml, the mappings, and the installed plugins. >>>>>>> >>>>>>> Jörg >>>>>>> >>>>>>> >>>>>>> On Sat, Aug 23, 2014 at 11:25 AM, [email protected] < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> I have some Solaris 10 Sparc V440/V445 servers available and can >>>>>>>> try to reproduce over the weekend. >>>>>>>> >>>>>>>> Jörg >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Aug 23, 2014 at 4:37 AM, Robert Muir < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> How big is it? Maybe i can have it anyway? I pulled two ancient >>>>>>>>> ultrasparcs out of my closet to try to debug your issue, but >>>>>>>>> unfortunately >>>>>>>>> they are a pita to work with (dead nvram battery on both, zeroed mac >>>>>>>>> address, etc.) Id still love to get to the bottom of this. >>>>>>>>> On Aug 22, 2014 3:59 PM, <[email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi Adrien, >>>>>>>>>> It's a bunch of garbled binary data, basically a dump of the >>>>>>>>>> process image. >>>>>>>>>> Tony >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thursday, August 21, 2014 6:36:12 PM UTC-4, Adrien Grand wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Tony, >>>>>>>>>>> >>>>>>>>>>> Do you have more information in the core dump file? (cf. the >>>>>>>>>>> "Core dump written" line that you pasted) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Aug 21, 2014 at 7:53 PM, <[email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hello, >>>>>>>>>>>> I installed ES 1.3.2 on a spare Solaris 11/ T4-4 SPARC server >>>>>>>>>>>> to scale out of small x86 machine. I get a similar exception >>>>>>>>>>>> running ES >>>>>>>>>>>> with JAVA_OPTS=-d64. When Logstash 1.4.1 sends the first message >>>>>>>>>>>> I get the >>>>>>>>>>>> error below on the ES process: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> # >>>>>>>>>>>> # A fatal error has been detected by the Java Runtime >>>>>>>>>>>> Environment: >>>>>>>>>>>> # >>>>>>>>>>>> # SIGBUS (0xa) at pc=0xffffffff7a9a3d8c, pid=14473, tid=209 >>>>>>>>>>>> # >>>>>>>>>>>> # JRE version: 7.0_25-b15 >>>>>>>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed >>>>>>>>>>>> mode solaris-sparc compressed oops) >>>>>>>>>>>> # Problematic frame: >>>>>>>>>>>> # V [libjvm.so+0xba3d8c] Unsafe_GetInt+0x158 >>>>>>>>>>>> # >>>>>>>>>>>> # Core dump written. Default location: >>>>>>>>>>>> /export/home/elasticsearch/elasticsearch-1.3.2/core or >>>>>>>>>>>> core.14473 >>>>>>>>>>>> # >>>>>>>>>>>> # If you would like to submit a bug report, please visit: >>>>>>>>>>>> # http://bugreport.sun.com/bugreport/crash.jsp >>>>>>>>>>>> # >>>>>>>>>>>> >>>>>>>>>>>> --------------- T H R E A D --------------- >>>>>>>>>>>> >>>>>>>>>>>> Current thread (0x0000000107078000): JavaThread >>>>>>>>>>>> "elasticsearch[KYLIE1][http_server_worker][T#17]{New I/O >>>>>>>>>>>> worker #147}" daemon [_thread_in_vm, id=209, >>>>>>>>>>>> stack(0xffffffff5b800000, >>>>>>>>>>>> 0xffffffff5b840000)] >>>>>>>>>>>> >>>>>>>>>>>> siginfo:si_signo=SIGBUS: si_errno=0, si_code=1 (BUS_ADRALN), >>>>>>>>>>>> si_addr=0x0000000709cc09e7 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I can run ES using 32bit java but have to shrink ES_HEAPS_SIZE >>>>>>>>>>>> more than I want to. Any assistance would be appreciated. >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Tony >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tuesday, July 22, 2014 5:43:28 AM UTC-4, David Roberts wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hello, >>>>>>>>>>>>> >>>>>>>>>>>>> After upgrading from Elasticsearch 1.0.1 to 1.2.2 I'm getting >>>>>>>>>>>>> JVM core dumps on Solaris 10 on SPARC. >>>>>>>>>>>>> >>>>>>>>>>>>> # A fatal error has been detected by the Java Runtime >>>>>>>>>>>>> Environment: >>>>>>>>>>>>> # >>>>>>>>>>>>> # SIGBUS (0xa) at pc=0xffffffff7e452d78, pid=15483, tid=263 >>>>>>>>>>>>> # >>>>>>>>>>>>> # JRE version: Java(TM) SE Runtime Environment (7.0_55-b13) >>>>>>>>>>>>> (build 1.7.0_55-b13) >>>>>>>>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.55-b03 mixed >>>>>>>>>>>>> mode solaris-sparc compressed oops) >>>>>>>>>>>>> # Problematic frame: >>>>>>>>>>>>> # V [libjvm.so+0xc52d78] Unsafe_GetLong+0x158 >>>>>>>>>>>>> >>>>>>>>>>>>> I'm pretty sure the problem here is that Elasticsearch is >>>>>>>>>>>>> making increasing use of "unsafe" functions in Java, presumably >>>>>>>>>>>>> to speed >>>>>>>>>>>>> things up, and some CPUs are more picky than others about memory >>>>>>>>>>>>> alignment. In particular, x86 will tolerate misaligned memory >>>>>>>>>>>>> access >>>>>>>>>>>>> whereas SPARC won't. >>>>>>>>>>>>> >>>>>>>>>>>>> Somebody has tried to report this to Oracle in the past and >>>>>>>>>>>>> (understandably) Oracle has said that if you're going to use >>>>>>>>>>>>> unsafe >>>>>>>>>>>>> functions you need to understand what you're doing: >>>>>>>>>>>>> http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8021574 >>>>>>>>>>>>> >>>>>>>>>>>>> A quick grep through the code of the two versions of >>>>>>>>>>>>> Elasticsearch shows that the new use of "unsafe" memory access >>>>>>>>>>>>> functions is >>>>>>>>>>>>> in the BytesReference, MurmurHash3 and HyperLogLogPlusPlus >>>>>>>>>>>>> classes: >>>>>>>>>>>>> >>>>>>>>>>>>> bash-3.2$ git checkout v1.0.1 >>>>>>>>>>>>> Checking out files: 100% (2904/2904), done. >>>>>>>>>>>>> >>>>>>>>>>>>> bash-3.2$ find . -name '*.java' | xargs grep UnsafeUtils >>>>>>>>>>>>> ./src/main/java/org/elasticsearch/common/util/UnsafeUtils.java:public >>>>>>>>>>>>> enum UnsafeUtils { >>>>>>>>>>>>> ./src/main/java/org/elasticsearch/search/aggregations/bucket/ >>>>>>>>>>>>> BytesRefHash.java: if (id == -1L || >>>>>>>>>>>>> UnsafeUtils.equals(key, get(id, spare))) { >>>>>>>>>>>>> ./src/main/java/org/elasticsearch/search/aggregations/bucket/ >>>>>>>>>>>>> BytesRefHash.java: } else if >>>>>>>>>>>>> (UnsafeUtils.equals(key, get(curId, spare))) { >>>>>>>>>>>>> ./src/test/java/org/elasticsearch/benchmark/common/util/Byte >>>>>>>>>>>>> sRefComparisonsBenchmark.java:import >>>>>>>>>>>>> org.elasticsearch.common.util.UnsafeUtils; >>>>>>>>>>>>> ./src/test/java/org/elasticsearch/benchmark/common/util/Byte >>>>>>>>>>>>> sRefComparisonsBenchmark.java: return >>>>>>>>>>>>> UnsafeUtils.equals(b1, b2); >>>>>>>>>>>>> >>>>>>>>>>>>> bash-3.2$ git checkout v1.2.2 >>>>>>>>>>>>> Checking out files: 100% (2220/2220), done. >>>>>>>>>>>>> >>>>>>>>>>>>> bash-3.2$ find . -name '*.java' | xargs grep UnsafeUtils >>>>>>>>>>>>> ./src/main/java/org/elasticsearch/common/bytes/BytesReference.java:import >>>>>>>>>>>>> org.elasticsearch.common.util.UnsafeUtils; >>>>>>>>>>>>> ./src/main/java/org/elasticsearch/common/bytes/BytesReferenc >>>>>>>>>>>>> e.java: return UnsafeUtils.equals(a.array(), >>>>>>>>>>>>> a.arrayOffset(), b.array(), b.arrayOffset(), a.length()); >>>>>>>>>>>>> ./src/main/java/org/elasticsearch/common/hash/MurmurHash3.java:import >>>>>>>>>>>>> org.elasticsearch.common.util.UnsafeUtils; >>>>>>>>>>>>> ./src/main/java/org/elasticsearch/common/hash/MurmurHash3.java: >>>>>>>>>>>>> return UnsafeUtils.readLongLE(key, blockOffset); >>>>>>>>>>>>> ./src/main/java/org/elasticsearch/common/hash/MurmurHash3.ja >>>>>>>>>>>>> va: long k1 = UnsafeUtils.readLongLE(key, i); >>>>>>>>>>>>> ./src/main/java/org/elasticsearch/common/hash/MurmurHash3.ja >>>>>>>>>>>>> va: long k2 = UnsafeUtils.readLongLE(key, i + >>>>>>>>>>>>> 8); >>>>>>>>>>>>> ./src/main/java/org/elasticsearch/common/util/BytesRefHash.java: >>>>>>>>>>>>> if (id == -1L || UnsafeUtils.equals(key, get(id, spare))) { >>>>>>>>>>>>> ./src/main/java/org/elasticsearch/common/util/BytesRefHash.java: >>>>>>>>>>>>> } else if (UnsafeUtils.equals(key, get(curId, spare))) { >>>>>>>>>>>>> ./src/main/java/org/elasticsearch/common/util/UnsafeUtils.java:public >>>>>>>>>>>>> enum UnsafeUtils { >>>>>>>>>>>>> ./src/main/java/org/elasticsearch/search/aggregations/metrics/ >>>>>>>>>>>>> cardinality/HyperLogLogPlusPlus.java:import >>>>>>>>>>>>> org.elasticsearch.common.util.UnsafeUtils; >>>>>>>>>>>>> ./src/main/java/org/elasticsearch/search/aggregations/metrics/ >>>>>>>>>>>>> cardinality/HyperLogLogPlusPlus.java: return >>>>>>>>>>>>> UnsafeUtils.readIntLE(readSpare.bytes, readSpare.offset); >>>>>>>>>>>>> ./src/test/java/org/elasticsearch/benchmark/common/util/Byte >>>>>>>>>>>>> sRefComparisonsBenchmark.java:import >>>>>>>>>>>>> org.elasticsearch.common.util.UnsafeUtils; >>>>>>>>>>>>> ./src/test/java/org/elasticsearch/benchmark/common/util/Byte >>>>>>>>>>>>> sRefComparisonsBenchmark.java: return >>>>>>>>>>>>> UnsafeUtils.equals(b1, b2); >>>>>>>>>>>>> >>>>>>>>>>>>> Presumably one of these three new uses is what is causing the >>>>>>>>>>>>> JVM SIGBUS error I'm seeing. >>>>>>>>>>>>> >>>>>>>>>>>>> A quick look at the MurmurHash3 class shows that the hash128 >>>>>>>>>>>>> method accepts an arbitrary offset and passes it to an unsafe >>>>>>>>>>>>> function with >>>>>>>>>>>>> no check that it's a multiple of 8: >>>>>>>>>>>>> >>>>>>>>>>>>> public static Hash128 hash128(byte[] key, int offset, int >>>>>>>>>>>>> length, long seed, Hash128 hash) { >>>>>>>>>>>>> long h1 = seed; >>>>>>>>>>>>> long h2 = seed; >>>>>>>>>>>>> >>>>>>>>>>>>> if (length >= 16) { >>>>>>>>>>>>> >>>>>>>>>>>>> final int len16 = length & 0xFFFFFFF0; // higher >>>>>>>>>>>>> multiple of 16 that is lower than or equal to length >>>>>>>>>>>>> final int end = offset + len16; >>>>>>>>>>>>> for (int i = offset; i < end; i += 16) { >>>>>>>>>>>>> long k1 = UnsafeUtils.readLongLE(key, i); >>>>>>>>>>>>> long k2 = UnsafeUtils.readLongLE(key, i + 8); >>>>>>>>>>>>> >>>>>>>>>>>>> This is a recipe for generating JVM core dumps on >>>>>>>>>>>>> architectures such as SPARC, Itanium and PowerPC that don't >>>>>>>>>>>>> support >>>>>>>>>>>>> unaligned 64 bit memory access. >>>>>>>>>>>>> >>>>>>>>>>>>> Does Elasticsearch have any policy for support of hardware >>>>>>>>>>>>> other than x86? If not, I don't think many people would care but >>>>>>>>>>>>> you >>>>>>>>>>>>> really ought to clearly say so on your platform support page. If >>>>>>>>>>>>> you do >>>>>>>>>>>>> intend to support non-x86 architectures then you need to be much >>>>>>>>>>>>> more >>>>>>>>>>>>> careful about the use of unsafe memory accesses. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> >>>>>>>>>>>>> David >>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>> Google Groups "elasticsearch" group. >>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>> it, send an email to [email protected]. >>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/eb7f4c23-b63 >>>>>>>>>>>> e-4c2e-87c3-029fc58449fc%40googlegroups.com >>>>>>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/eb7f4c23-b63e-4c2e-87c3-029fc58449fc%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>> . >>>>>>>>>>>> >>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Adrien Grand >>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>> Google Groups "elasticsearch" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>> send an email to [email protected]. >>>>>>>>>> To view this discussion on the web visit >>>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/12aa33de- >>>>>>>>>> ccc7-485a-8c52-562f3e91a535%40googlegroups.com >>>>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/12aa33de-ccc7-485a-8c52-562f3e91a535%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>> . >>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "elasticsearch" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/ >>>>>>>>> CAMUKNZXOKeJq8Datx2KY7cSfJXDH1YGDNmQjNWDQ2jci%3DfN31Q% >>>>>>>>> 40mail.gmail.com >>>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAMUKNZXOKeJq8Datx2KY7cSfJXDH1YGDNmQjNWDQ2jci%3DfN31Q%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elasticsearch" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/elasticsearch/c62191ea-543b-462d-95e9-aff125c0a6f0%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/elasticsearch/c62191ea-543b-462d-95e9-aff125c0a6f0%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHrOOqhgOSiRhmweSR5wLs%2BJiO70_CSRO%2BFS2zOU9VKzg%40mail.gmail.com >> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHrOOqhgOSiRhmweSR5wLs%2BJiO70_CSRO%2BFS2zOU9VKzg%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAvCoJwN8tSJXa8%3DZHMDYw_mpHc0Q866fcso_1LZCFiyw%40mail.gmail.com > <https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAvCoJwN8tSJXa8%3DZHMDYw_mpHc0Q866fcso_1LZCFiyw%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHLmXs3tp9KPBin9dpr0oU9YA%2B4kgPvcOFtD%2BytPdLd5Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
