[ https://issues.apache.org/jira/browse/HADOOP-12941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran resolved HADOOP-12941. ------------------------------------- Resolution: Won't Fix There is no IA64 any more, sorry > abort in Unsafe_GetLong when running IA64 HPUX 64bit mode > ---------------------------------------------------------- > > Key: HADOOP-12941 > URL: https://issues.apache.org/jira/browse/HADOOP-12941 > Project: Hadoop Common > Issue Type: Bug > Environment: hpux IA64 running 64bit mode > Reporter: gene bradley > Priority: Major > > Now that we have a core to look at we can sorta see what is going on#14 > 0x9fffffffaf000dd0 in Java native_call_stub frame#15 0x9fffffffaf014470 in > JNI frame: sun.misc.Unsafe::getLong (java.lang.Object, long) ->long#16 > 0x9fffffffaf0067a0 in interpreted frame: > org.apache.hadoop.hbase.util.Bytes$LexicographicalComparerHolder$UnsafeComparer::compareTo > (byte[], int, int, byte[], int, int) ->int bci: 74#17 0x9fffffffaf0066e0 in > interpreted frame: > org.apache.hadoop.hbase.util.Bytes$LexicographicalComparerHolder$UnsafeComparer::compareTo > (java.lang.Object, int, int, java.lang.Object, int, int) ->int bci: 16#18 > 0x9fffffffaf006720 in interpreted frame: > org.apache.hadoop.hbase.util.Bytes::compareTo (byte[], int, int, byte[], int, > int) ->int bci: 11#19 0x9fffffffaf0066e0 in interpreted frame: > org.apache.hadoop.hbase.KeyValue$KVComparator::compareRowKey > (org.apache.hadoop.hbase.Cell, org.apache.hadoop.hbase.Cell) ->int bci: 36#20 > 0x9fffffffaf0066e0 in interpreted frame: > org.apache.hadoop.hbase.KeyValue$KVComparator::compare > (org.apache.hadoop.hbase.Cell, org.apache.hadoop.hbase.Cell) ->int bci: 3#21 > 0x9fffffffaf0066e0 in interpreted frame: > org.apache.hadoop.hbase.KeyValue$KVComparator::compare (java.lang.Object, > java.lang.Object) ->int bci: 9;; Line: 4000xc00000003ad84d30:0 > <Unsafe_GetLong+0x130>: (p1) ld8 > r45=[r34]0xc00000003ad84d30:1 <Unsafe_GetLong+0x131>: adds > r34=16,r320xc00000003ad84d30:2 <Unsafe_GetLong+0x132>: adds > ret0=8,r32;;0xc00000003ad84d40:0 <Unsafe_GetLong+0x140>: add > ret1=r35,r45 <==== r35 is off0xc00000003ad84d40:1 > <Unsafe_GetLong+0x141>: ld8 > r35=[r34],240xc00000003ad84d40:2 <Unsafe_GetLong+0x142>: nop.i > 0x00xc00000003ad84d50:0 <Unsafe_GetLong+0x150>: ld8 > r41=[ret0];;0xc00000003ad84d50:1 <Unsafe_GetLong+0x151>: ld8.s > r49=[r34],-240xc00000003ad84d50:2 <Unsafe_GetLong+0x152>: > nop.i 0x00xc00000003ad84d60:0 <Unsafe_GetLong+0x160>: ld8 > r39=[ret1];; <=== abort0xc00000003ad84d60:1 > <Unsafe_GetLong+0x161>: ld8 > ret0=[r35]0xc00000003ad84d60:2 <Unsafe_GetLong+0x162>: nop.i > 0x0;;0xc00000003ad84d70:0 <Unsafe_GetLong+0x170>: cmp.ne.unc > p1=r0,ret0;;M,MI0xc00000003ad84d70:1 <Unsafe_GetLong+0x171>: (p1) mov > r48=r410xc00000003ad84d70:2 <Unsafe_GetLong+0x172>: (p1) > chk.s.i r49,Unsafe_GetLong+0x290(gdb) x /10i > $pc-48*20x9fffffffaf000d70: flushrs > MMI0x9fffffffaf000d71: mov > r44=r320x9fffffffaf000d72: mov > r45=r330x9fffffffaf000d80: mov r46=r34 > MMI0x9fffffffaf000d81: mov > r47=r350x9fffffffaf000d82: mov > r48=r360x9fffffffaf000d90: mov r49=r37 > MMI0x9fffffffaf000d91: mov > r50=r380x9fffffffaf000d92: mov r51=r39 > 0x9fffffffaf000da0: adds r14=0x270,r4 > MMI(gdb) p /x $r35$9 = 0x22(gdb) x /x > $ret10x9ffffffe1d0d2bda: 0x677a68676c78743a(gdb) x /x > $r45+0x220x9ffffffe1d0d2bda: 0x677a68676c78743aSo here is the problem, > this is a 64bit JVM 0 : /opt/java8/bin/IA64W/java1 : > -Djava.util.logging.config.file=/test28/gzh/tomcat/conf/logging.properties2 : > -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager3 : > -Dorg.apache.catalina.security.SecurityListener.UMASK=0224 : -server5 : > -XX:PermSize=128m6 : -XX:MaxPermSize=256m7 : > -Djava.endorsed.dirs=/test28/gzh/tomcat/endorsed8 : -classpath9 : > /test28/gzh/tomcat/bin/bootstrap.jar:/test28/gzh/tomcat/bin/tomcat-juli.jar10 > : -Dcatalina.base=/test28/gzh/tomcat11 : -Dcatalina.home=/test28/gzh/tomcat12 > : -Djava.io.tmpdir=/test28/gzh/tomcat/temp13 : > org.apache.catalina.startup.Bootstrap14 : startSince they are not passing and > -Xmx values we are taking defaults which look at the system resources. So > what is happening here is a 32 bit word aligned address is being used to > index into a byte array (gdb) jo 0x9ffffffe1d0d2bb8_mark = > 0x0000000000000001, _klass = 0x9fffffffa8c00768, instance of type [Blength of > the array: 1180 0 0 102 0 0 0 8 0 70 103 122 104 103 108 120 116 58 70 83 78 > 95 50 48 49 53 49 48 50 50 44 65 44 49 52 52 53 52 55 57 57 51 51 57 53 56 46 > 52 56 54 55 50 48 51 49 99 57 97 101 52 57 101 97 101 49 100 56 49 51 53 51 > 99 99 97 97 54 98 56 100 46 4 105 110 102 111 115 101 113 110 117 109 68 117 > 114 105 110 103 79 112 101 110 0 0 1 80 -6 96 -95 -48 4 0 0 0 0 0 0 0 4This > is the whole string gdb) x /2s 0x9ffffffe1d0d2bd80x9ffffffe1d0d2bd8: > ""0x9ffffffe1d0d2bd9: > "Fgzhglxt:FSN_20151022,A,1445479933958.48672031c9ae49eae1d81353ccaa6b8d.\004infoseqnumDuringOpen"To > me this is a bug in the callee potentially in > org.apache.hadoop.hbase.util.Bytes$LexicographicalComparerHolder$UnsafeComparer::compareToWhy > are they calling Unsafe_GetLong on a byte array, there is no checking of > alignment and I really think this is a bug on their part. As far as I know, > GetLong expects 64 bit alignment I did find some other 64 bit users who saw > this with the same stack trace as this customer > https://issues.apache.org/jira/browse/PHOENIX-1438http://permalink.gmane.org/gmane.comp.java.hadoop.hbase.devel/39017 > the fix would go here by adding a test for ia64 > looking at the code from a bug they are checking for if the box is sparc. > static Comparer<byte[]> getBestComparer() { > + if (System.getProperty("os.arch").equals("sparc")) { <==== > + if (LOG.isTraceEnabled()) { > + LOG.trace("Lexicographical comparer selected for " > + + "byte aligned system architecture"); > + } > + return lexicographicalComparerJavaImpl(); > + } > try { > Class<?> theClass = Class.forName(UNSAFE_COMPARER_NAME);so this is > 'fixable' from a java class perspective.Hari said he will talk with his open > source contact > This Hadoop bug report points to the same problem in the same code: > https://issues.apache.org/jira/browse/HADOOP-11466 > In that case the symptom of the unaligned accesses was bad performance > instead of a crash. This shows diffs for that fix: > http://mail-archives.apache.org/mod_mbox/hadoop-common-commits/201501.mbox/%3cb19d5f83ca7148b782e5b432817b6...@git.apache.org%3E > Those diffs show that fix only avoids the bad code when running on "sparc". > They really should have instead avoided that bad code for every architecture > other than x86. They should not be assuming that that FastByteComparisons > enhancement will work on other processors and actually improves performance. > On processors that do allow unaligned accesses at much cost they are just > creating bad performance that will be hard for anyone to ever find. > For all IA64 customers this will be an issue when running 64 bit. The IA > processor enforces alignment on instruction types -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org