[
https://issues.apache.org/jira/browse/HADOOP-12941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15409994#comment-15409994
]
gene bradley commented on HADOOP-12941:
---------------------------------------
HI Colin,
How can I get this fixed. Again it’s a real simple fix.
Gene
> abort in Unsafe_GetLong when running IA64 HPUX 64bit mode
> ----------------------------------------------------------
>
> Key: HADOOP-12941
> URL: https://issues.apache.org/jira/browse/HADOOP-12941
> Project: Hadoop Common
> Issue Type: Bug
> Environment: hpux IA64 running 64bit mode
> Reporter: gene bradley
>
> Now that we have a core to look at we can sorta see what is going on#14
> 0x9fffffffaf000dd0 in Java native_call_stub frame#15 0x9fffffffaf014470 in
> JNI frame: sun.misc.Unsafe::getLong (java.lang.Object, long) ->long#16
> 0x9fffffffaf0067a0 in interpreted frame:
> org.apache.hadoop.hbase.util.Bytes$LexicographicalComparerHolder$UnsafeComparer::compareTo
> (byte[], int, int, byte[], int, int) ->int bci: 74#17 0x9fffffffaf0066e0 in
> interpreted frame:
> org.apache.hadoop.hbase.util.Bytes$LexicographicalComparerHolder$UnsafeComparer::compareTo
> (java.lang.Object, int, int, java.lang.Object, int, int) ->int bci: 16#18
> 0x9fffffffaf006720 in interpreted frame:
> org.apache.hadoop.hbase.util.Bytes::compareTo (byte[], int, int, byte[], int,
> int) ->int bci: 11#19 0x9fffffffaf0066e0 in interpreted frame:
> org.apache.hadoop.hbase.KeyValue$KVComparator::compareRowKey
> (org.apache.hadoop.hbase.Cell, org.apache.hadoop.hbase.Cell) ->int bci: 36#20
> 0x9fffffffaf0066e0 in interpreted frame:
> org.apache.hadoop.hbase.KeyValue$KVComparator::compare
> (org.apache.hadoop.hbase.Cell, org.apache.hadoop.hbase.Cell) ->int bci: 3#21
> 0x9fffffffaf0066e0 in interpreted frame:
> org.apache.hadoop.hbase.KeyValue$KVComparator::compare (java.lang.Object,
> java.lang.Object) ->int bci: 9;; Line: 4000xc00000003ad84d30:0
> <Unsafe_GetLong+0x130>: (p1) ld8
> r45=[r34]0xc00000003ad84d30:1 <Unsafe_GetLong+0x131>: adds
> r34=16,r320xc00000003ad84d30:2 <Unsafe_GetLong+0x132>: adds
> ret0=8,r32;;0xc00000003ad84d40:0 <Unsafe_GetLong+0x140>: add
> ret1=r35,r45 <==== r35 is off0xc00000003ad84d40:1
> <Unsafe_GetLong+0x141>: ld8
> r35=[r34],240xc00000003ad84d40:2 <Unsafe_GetLong+0x142>: nop.i
> 0x00xc00000003ad84d50:0 <Unsafe_GetLong+0x150>: ld8
> r41=[ret0];;0xc00000003ad84d50:1 <Unsafe_GetLong+0x151>: ld8.s
> r49=[r34],-240xc00000003ad84d50:2 <Unsafe_GetLong+0x152>:
> nop.i 0x00xc00000003ad84d60:0 <Unsafe_GetLong+0x160>: ld8
> r39=[ret1];; <=== abort0xc00000003ad84d60:1
> <Unsafe_GetLong+0x161>: ld8
> ret0=[r35]0xc00000003ad84d60:2 <Unsafe_GetLong+0x162>: nop.i
> 0x0;;0xc00000003ad84d70:0 <Unsafe_GetLong+0x170>: cmp.ne.unc
> p1=r0,ret0;;M,MI0xc00000003ad84d70:1 <Unsafe_GetLong+0x171>: (p1) mov
> r48=r410xc00000003ad84d70:2 <Unsafe_GetLong+0x172>: (p1)
> chk.s.i r49,Unsafe_GetLong+0x290(gdb) x /10i
> $pc-48*20x9fffffffaf000d70: flushrs
> MMI0x9fffffffaf000d71: mov
> r44=r320x9fffffffaf000d72: mov
> r45=r330x9fffffffaf000d80: mov r46=r34
> MMI0x9fffffffaf000d81: mov
> r47=r350x9fffffffaf000d82: mov
> r48=r360x9fffffffaf000d90: mov r49=r37
> MMI0x9fffffffaf000d91: mov
> r50=r380x9fffffffaf000d92: mov r51=r39
> 0x9fffffffaf000da0: adds r14=0x270,r4
> MMI(gdb) p /x $r35$9 = 0x22(gdb) x /x
> $ret10x9ffffffe1d0d2bda: 0x677a68676c78743a(gdb) x /x
> $r45+0x220x9ffffffe1d0d2bda: 0x677a68676c78743aSo here is the problem,
> this is a 64bit JVM 0 : /opt/java8/bin/IA64W/java1 :
> -Djava.util.logging.config.file=/test28/gzh/tomcat/conf/logging.properties2 :
> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager3 :
> -Dorg.apache.catalina.security.SecurityListener.UMASK=0224 : -server5 :
> -XX:PermSize=128m6 : -XX:MaxPermSize=256m7 :
> -Djava.endorsed.dirs=/test28/gzh/tomcat/endorsed8 : -classpath9 :
> /test28/gzh/tomcat/bin/bootstrap.jar:/test28/gzh/tomcat/bin/tomcat-juli.jar10
> : -Dcatalina.base=/test28/gzh/tomcat11 : -Dcatalina.home=/test28/gzh/tomcat12
> : -Djava.io.tmpdir=/test28/gzh/tomcat/temp13 :
> org.apache.catalina.startup.Bootstrap14 : startSince they are not passing and
> -Xmx values we are taking defaults which look at the system resources. So
> what is happening here is a 32 bit word aligned address is being used to
> index into a byte array (gdb) jo 0x9ffffffe1d0d2bb8_mark =
> 0x0000000000000001, _klass = 0x9fffffffa8c00768, instance of type [Blength of
> the array: 1180 0 0 102 0 0 0 8 0 70 103 122 104 103 108 120 116 58 70 83 78
> 95 50 48 49 53 49 48 50 50 44 65 44 49 52 52 53 52 55 57 57 51 51 57 53 56 46
> 52 56 54 55 50 48 51 49 99 57 97 101 52 57 101 97 101 49 100 56 49 51 53 51
> 99 99 97 97 54 98 56 100 46 4 105 110 102 111 115 101 113 110 117 109 68 117
> 114 105 110 103 79 112 101 110 0 0 1 80 -6 96 -95 -48 4 0 0 0 0 0 0 0 4This
> is the whole string gdb) x /2s 0x9ffffffe1d0d2bd80x9ffffffe1d0d2bd8:
> ""0x9ffffffe1d0d2bd9:
> "Fgzhglxt:FSN_20151022,A,1445479933958.48672031c9ae49eae1d81353ccaa6b8d.\004infoseqnumDuringOpen"To
> me this is a bug in the callee potentially in
> org.apache.hadoop.hbase.util.Bytes$LexicographicalComparerHolder$UnsafeComparer::compareToWhy
> are they calling Unsafe_GetLong on a byte array, there is no checking of
> alignment and I really think this is a bug on their part. As far as I know,
> GetLong expects 64 bit alignment I did find some other 64 bit users who saw
> this with the same stack trace as this customer
> https://issues.apache.org/jira/browse/PHOENIX-1438http://permalink.gmane.org/gmane.comp.java.hadoop.hbase.devel/39017
> the fix would go here by adding a test for ia64
> looking at the code from a bug they are checking for if the box is sparc.
> static Comparer<byte[]> getBestComparer() {
> + if (System.getProperty("os.arch").equals("sparc")) { <====
> + if (LOG.isTraceEnabled()) {
> + LOG.trace("Lexicographical comparer selected for "
> + + "byte aligned system architecture");
> + }
> + return lexicographicalComparerJavaImpl();
> + }
> try {
> Class<?> theClass = Class.forName(UNSAFE_COMPARER_NAME);so this is
> 'fixable' from a java class perspective.Hari said he will talk with his open
> source contact
> This Hadoop bug report points to the same problem in the same code:
> https://issues.apache.org/jira/browse/HADOOP-11466
> In that case the symptom of the unaligned accesses was bad performance
> instead of a crash. This shows diffs for that fix:
> http://mail-archives.apache.org/mod_mbox/hadoop-common-commits/201501.mbox/%[email protected]%3E
> Those diffs show that fix only avoids the bad code when running on "sparc".
> They really should have instead avoided that bad code for every architecture
> other than x86. They should not be assuming that that FastByteComparisons
> enhancement will work on other processors and actually improves performance.
> On processors that do allow unaligned accesses at much cost they are just
> creating bad performance that will be hard for anyone to ever find.
> For all IA64 customers this will be an issue when running 64 bit. The IA
> processor enforces alignment on instruction types
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]