I was able to trim the heap size and, consequently, the core file down to about 530m.
Tony On Monday, August 25, 2014 3:41:14 PM UTC-4, [email protected] wrote: > > It's as big as my ES_HEAP_SIZE parameter, 30g. > > Tony > > On Friday, August 22, 2014 10:37:39 PM UTC-4, Robert Muir wrote: >> >> How big is it? Maybe i can have it anyway? I pulled two ancient >> ultrasparcs out of my closet to try to debug your issue, but unfortunately >> they are a pita to work with (dead nvram battery on both, zeroed mac >> address, etc.) Id still love to get to the bottom of this. >> On Aug 22, 2014 3:59 PM, <[email protected]> wrote: >> >>> Hi Adrien, >>> It's a bunch of garbled binary data, basically a dump of the process >>> image. >>> Tony >>> >>> >>> On Thursday, August 21, 2014 6:36:12 PM UTC-4, Adrien Grand wrote: >>>> >>>> Hi Tony, >>>> >>>> Do you have more information in the core dump file? (cf. the "Core dump >>>> written" line that you pasted) >>>> >>>> >>>> On Thu, Aug 21, 2014 at 7:53 PM, <[email protected]> wrote: >>>> >>>>> Hello, >>>>> I installed ES 1.3.2 on a spare Solaris 11/ T4-4 SPARC server to scale >>>>> out of small x86 machine. I get a similar exception running ES with >>>>> JAVA_OPTS=-d64. When Logstash 1.4.1 sends the first message I get the >>>>> error below on the ES process: >>>>> >>>>> >>>>> # >>>>> # A fatal error has been detected by the Java Runtime Environment: >>>>> # >>>>> # SIGBUS (0xa) at pc=0xffffffff7a9a3d8c, pid=14473, tid=209 >>>>> # >>>>> # JRE version: 7.0_25-b15 >>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode >>>>> solaris-sparc compressed oops) >>>>> # Problematic frame: >>>>> # V [libjvm.so+0xba3d8c] Unsafe_GetInt+0x158 >>>>> # >>>>> # Core dump written. Default location: >>>>> /export/home/elasticsearch/elasticsearch-1.3.2/core >>>>> or core.14473 >>>>> # >>>>> # If you would like to submit a bug report, please visit: >>>>> # http://bugreport.sun.com/bugreport/crash.jsp >>>>> # >>>>> >>>>> --------------- T H R E A D --------------- >>>>> >>>>> Current thread (0x0000000107078000): JavaThread >>>>> "elasticsearch[KYLIE1][http_server_worker][T#17]{New I/O worker >>>>> #147}" daemon [_thread_in_vm, id=209, stack(0xffffffff5b800000, >>>>> 0xffffffff5b840000)] >>>>> >>>>> siginfo:si_signo=SIGBUS: si_errno=0, si_code=1 (BUS_ADRALN), >>>>> si_addr=0x0000000709cc09e7 >>>>> >>>>> >>>>> I can run ES using 32bit java but have to shrink ES_HEAPS_SIZE more >>>>> than I want to. Any assistance would be appreciated. >>>>> >>>>> Regards, >>>>> Tony >>>>> >>>>> >>>>> On Tuesday, July 22, 2014 5:43:28 AM UTC-4, David Roberts wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>> After upgrading from Elasticsearch 1.0.1 to 1.2.2 I'm getting JVM >>>>>> core dumps on Solaris 10 on SPARC. >>>>>> >>>>>> # A fatal error has been detected by the Java Runtime Environment: >>>>>> # >>>>>> # SIGBUS (0xa) at pc=0xffffffff7e452d78, pid=15483, tid=263 >>>>>> # >>>>>> # JRE version: Java(TM) SE Runtime Environment (7.0_55-b13) (build >>>>>> 1.7.0_55-b13) >>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.55-b03 mixed mode >>>>>> solaris-sparc compressed oops) >>>>>> # Problematic frame: >>>>>> # V [libjvm.so+0xc52d78] Unsafe_GetLong+0x158 >>>>>> >>>>>> I'm pretty sure the problem here is that Elasticsearch is making >>>>>> increasing use of "unsafe" functions in Java, presumably to speed things >>>>>> up, and some CPUs are more picky than others about memory alignment. In >>>>>> particular, x86 will tolerate misaligned memory access whereas SPARC >>>>>> won't. >>>>>> >>>>>> Somebody has tried to report this to Oracle in the past and >>>>>> (understandably) Oracle has said that if you're going to use unsafe >>>>>> functions you need to understand what you're doing: >>>>>> http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8021574 >>>>>> >>>>>> A quick grep through the code of the two versions of Elasticsearch >>>>>> shows that the new use of "unsafe" memory access functions is in the >>>>>> BytesReference, MurmurHash3 and HyperLogLogPlusPlus classes: >>>>>> >>>>>> bash-3.2$ git checkout v1.0.1 >>>>>> Checking out files: 100% (2904/2904), done. >>>>>> >>>>>> bash-3.2$ find . -name '*.java' | xargs grep UnsafeUtils >>>>>> ./src/main/java/org/elasticsearch/common/util/UnsafeUtils.java:public >>>>>> enum UnsafeUtils { >>>>>> ./src/main/java/org/elasticsearch/search/aggregations/bucket/ >>>>>> BytesRefHash.java: if (id == -1L || >>>>>> UnsafeUtils.equals(key, get(id, spare))) { >>>>>> ./src/main/java/org/elasticsearch/search/aggregations/bucket/ >>>>>> BytesRefHash.java: } else if (UnsafeUtils.equals(key, >>>>>> get(curId, spare))) { >>>>>> ./src/test/java/org/elasticsearch/benchmark/common/util/Byte >>>>>> sRefComparisonsBenchmark.java:import org.elasticsearch.common.util. >>>>>> UnsafeUtils; >>>>>> ./src/test/java/org/elasticsearch/benchmark/common/util/Byte >>>>>> sRefComparisonsBenchmark.java: return >>>>>> UnsafeUtils.equals(b1, b2); >>>>>> >>>>>> bash-3.2$ git checkout v1.2.2 >>>>>> Checking out files: 100% (2220/2220), done. >>>>>> >>>>>> bash-3.2$ find . -name '*.java' | xargs grep UnsafeUtils >>>>>> ./src/main/java/org/elasticsearch/common/bytes/BytesReference.java:import >>>>>> >>>>>> org.elasticsearch.common.util.UnsafeUtils; >>>>>> ./src/main/java/org/elasticsearch/common/bytes/BytesReferenc >>>>>> e.java: return UnsafeUtils.equals(a.array(), >>>>>> a.arrayOffset(), b.array(), b.arrayOffset(), a.length()); >>>>>> ./src/main/java/org/elasticsearch/common/hash/MurmurHash3.java:import >>>>>> org.elasticsearch.common.util.UnsafeUtils; >>>>>> ./src/main/java/org/elasticsearch/common/hash/MurmurHash3.java: >>>>>> return UnsafeUtils.readLongLE(key, blockOffset); >>>>>> ./src/main/java/org/elasticsearch/common/hash/MurmurHash3. >>>>>> java: long k1 = UnsafeUtils.readLongLE(key, i); >>>>>> ./src/main/java/org/elasticsearch/common/hash/MurmurHash3. >>>>>> java: long k2 = UnsafeUtils.readLongLE(key, i + 8); >>>>>> ./src/main/java/org/elasticsearch/common/util/BytesRefHash.java: >>>>>> >>>>>> if (id == -1L || UnsafeUtils.equals(key, get(id, spare))) { >>>>>> ./src/main/java/org/elasticsearch/common/util/BytesRefHash.java: >>>>>> >>>>>> } else if (UnsafeUtils.equals(key, get(curId, spare))) { >>>>>> ./src/main/java/org/elasticsearch/common/util/UnsafeUtils.java:public >>>>>> enum UnsafeUtils { >>>>>> ./src/main/java/org/elasticsearch/search/aggregations/metrics/ >>>>>> cardinality/HyperLogLogPlusPlus.java:import >>>>>> org.elasticsearch.common.util.UnsafeUtils; >>>>>> ./src/main/java/org/elasticsearch/search/aggregations/metrics/ >>>>>> cardinality/HyperLogLogPlusPlus.java: return >>>>>> UnsafeUtils.readIntLE(readSpare.bytes, readSpare.offset); >>>>>> ./src/test/java/org/elasticsearch/benchmark/common/util/Byte >>>>>> sRefComparisonsBenchmark.java:import org.elasticsearch.common.util. >>>>>> UnsafeUtils; >>>>>> ./src/test/java/org/elasticsearch/benchmark/common/util/Byte >>>>>> sRefComparisonsBenchmark.java: return >>>>>> UnsafeUtils.equals(b1, b2); >>>>>> >>>>>> Presumably one of these three new uses is what is causing the JVM >>>>>> SIGBUS error I'm seeing. >>>>>> >>>>>> A quick look at the MurmurHash3 class shows that the hash128 method >>>>>> accepts an arbitrary offset and passes it to an unsafe function with no >>>>>> check that it's a multiple of 8: >>>>>> >>>>>> public static Hash128 hash128(byte[] key, int offset, int length, >>>>>> long seed, Hash128 hash) { >>>>>> long h1 = seed; >>>>>> long h2 = seed; >>>>>> >>>>>> if (length >= 16) { >>>>>> >>>>>> final int len16 = length & 0xFFFFFFF0; // higher multiple >>>>>> of 16 that is lower than or equal to length >>>>>> final int end = offset + len16; >>>>>> for (int i = offset; i < end; i += 16) { >>>>>> long k1 = UnsafeUtils.readLongLE(key, i); >>>>>> long k2 = UnsafeUtils.readLongLE(key, i + 8); >>>>>> >>>>>> This is a recipe for generating JVM core dumps on architectures such >>>>>> as SPARC, Itanium and PowerPC that don't support unaligned 64 bit memory >>>>>> access. >>>>>> >>>>>> Does Elasticsearch have any policy for support of hardware other than >>>>>> x86? If not, I don't think many people would care but you really ought >>>>>> to >>>>>> clearly say so on your platform support page. If you do intend to >>>>>> support >>>>>> non-x86 architectures then you need to be much more careful about the >>>>>> use >>>>>> of unsafe memory accesses. >>>>>> >>>>>> Regards, >>>>>> >>>>>> David >>>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elasticsearch" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>> msgid/elasticsearch/eb7f4c23-b63e-4c2e-87c3-029fc58449fc% >>>>> 40googlegroups.com >>>>> <https://groups.google.com/d/msgid/elasticsearch/eb7f4c23-b63e-4c2e-87c3-029fc58449fc%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >>>> >>>> -- >>>> Adrien Grand >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/12aa33de-ccc7-485a-8c52-562f3e91a535%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/elasticsearch/12aa33de-ccc7-485a-8c52-562f3e91a535%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/54901f16-a43e-4508-abc3-dce2e9ab88a4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
