[ 
https://issues.apache.org/jira/browse/HADOOP-12941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203364#comment-15203364
 ] 

Colin Patrick McCabe commented on HADOOP-12941:
-----------------------------------------------

Hi [~genebradley],

I recommend keeping your description short so that people can quickly look at 
it and see what this bug is about.  You can provide additional details in the 
first few comments if necessary.

The discussion on HADOOP-11466 does explain why we chose to blacklist SPARC 
rather than whitelisting x86.  The biggest reason is that there are many x86 
variants with slightly different names, and we were afraid of missing one and 
causing a fallback to unoptimized performance on x86.  The other reason is that 
most modern architectures support unaligned access, so disabling the 
optimization is getting less and less relevant.

I see that you are running on Itanium.  This is actually a platform I'm not 
familiar with.  It seems like both Red Hat and Microsoft have dropped support 
for Itanium in their latest releases, so I'm not sure how much more of this 
architecture we will see.

Would you care to post a patch fixing the problem on Itanium?

> abort in Unsafe_GetLong when running IA64 HPUX 64bit mode 
> ----------------------------------------------------------
>
>                 Key: HADOOP-12941
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12941
>             Project: Hadoop Common
>          Issue Type: Bug
>         Environment: hpux IA64  running 64bit mode 
>            Reporter: gene bradley
>
> Now that we have a core to look at we can sorta see what is going on#14 
> 0x9fffffffaf000dd0 in Java native_call_stub frame#15 0x9fffffffaf014470 in 
> JNI frame: sun.misc.Unsafe::getLong (java.lang.Object, long) ->long#16 
> 0x9fffffffaf0067a0 in interpreted frame: 
> org.apache.hadoop.hbase.util.Bytes$LexicographicalComparerHolder$UnsafeComparer::compareTo
>  (byte[], int, int, byte[], int, int) ->int bci: 74#17 0x9fffffffaf0066e0 in 
> interpreted frame: 
> org.apache.hadoop.hbase.util.Bytes$LexicographicalComparerHolder$UnsafeComparer::compareTo
>  (java.lang.Object, int, int, java.lang.Object, int, int) ->int bci: 16#18 
> 0x9fffffffaf006720 in interpreted frame: 
> org.apache.hadoop.hbase.util.Bytes::compareTo (byte[], int, int, byte[], int, 
> int) ->int bci: 11#19 0x9fffffffaf0066e0 in interpreted frame: 
> org.apache.hadoop.hbase.KeyValue$KVComparator::compareRowKey 
> (org.apache.hadoop.hbase.Cell, org.apache.hadoop.hbase.Cell) ->int bci: 36#20 
> 0x9fffffffaf0066e0 in interpreted frame: 
> org.apache.hadoop.hbase.KeyValue$KVComparator::compare 
> (org.apache.hadoop.hbase.Cell, org.apache.hadoop.hbase.Cell) ->int bci: 3#21 
> 0x9fffffffaf0066e0 in interpreted frame: 
> org.apache.hadoop.hbase.KeyValue$KVComparator::compare (java.lang.Object, 
> java.lang.Object) ->int bci: 9;; Line: 4000xc00000003ad84d30:0 
> <Unsafe_GetLong+0x130>:    (p1)  ld8              
> r45=[r34]0xc00000003ad84d30:1 <Unsafe_GetLong+0x131>:          adds           
>   r34=16,r320xc00000003ad84d30:2 <Unsafe_GetLong+0x132>:          adds        
>      ret0=8,r32;;0xc00000003ad84d40:0 <Unsafe_GetLong+0x140>:          add    
>           ret1=r35,r45 <==== r35 is off0xc00000003ad84d40:1 
> <Unsafe_GetLong+0x141>:          ld8              
> r35=[r34],240xc00000003ad84d40:2 <Unsafe_GetLong+0x142>:          nop.i       
>      0x00xc00000003ad84d50:0 <Unsafe_GetLong+0x150>:          ld8             
>  r41=[ret0];;0xc00000003ad84d50:1 <Unsafe_GetLong+0x151>:          ld8.s      
>       r49=[r34],-240xc00000003ad84d50:2 <Unsafe_GetLong+0x152>:          
> nop.i            0x00xc00000003ad84d60:0 <Unsafe_GetLong+0x160>:          ld8 
>              r39=[ret1];; <=== abort0xc00000003ad84d60:1 
> <Unsafe_GetLong+0x161>:          ld8              
> ret0=[r35]0xc00000003ad84d60:2 <Unsafe_GetLong+0x162>:          nop.i         
>    0x0;;0xc00000003ad84d70:0 <Unsafe_GetLong+0x170>:          cmp.ne.unc      
>  p1=r0,ret0;;M,MI0xc00000003ad84d70:1 <Unsafe_GetLong+0x171>:    (p1)  mov    
>           r48=r410xc00000003ad84d70:2 <Unsafe_GetLong+0x172>:    (p1)  
> chk.s.i          r49,Unsafe_GetLong+0x290(gdb) x /10i 
> $pc-48*20x9fffffffaf000d70:           flushrs                                 
>                            MMI0x9fffffffaf000d71:           mov              
> r44=r320x9fffffffaf000d72:           mov              
> r45=r330x9fffffffaf000d80:           mov              r46=r34                 
>                           MMI0x9fffffffaf000d81:           mov              
> r47=r350x9fffffffaf000d82:           mov              
> r48=r360x9fffffffaf000d90:           mov              r49=r37                 
>                           MMI0x9fffffffaf000d91:           mov              
> r50=r380x9fffffffaf000d92:           mov              r51=r39
> 0x9fffffffaf000da0:           adds             r14=0x270,r4                   
>                    MMI(gdb) p /x $r35$9 = 0x22(gdb) x /x 
> $ret10x9ffffffe1d0d2bda:     0x677a68676c78743a(gdb) x /x 
> $r45+0x220x9ffffffe1d0d2bda:     0x677a68676c78743aSo here is the problem,  
> this is a 64bit JVM 0 : /opt/java8/bin/IA64W/java1 : 
> -Djava.util.logging.config.file=/test28/gzh/tomcat/conf/logging.properties2 : 
> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager3 : 
> -Dorg.apache.catalina.security.SecurityListener.UMASK=0224 : -server5 : 
> -XX:PermSize=128m6 : -XX:MaxPermSize=256m7 : 
> -Djava.endorsed.dirs=/test28/gzh/tomcat/endorsed8 : -classpath9 : 
> /test28/gzh/tomcat/bin/bootstrap.jar:/test28/gzh/tomcat/bin/tomcat-juli.jar10 
> : -Dcatalina.base=/test28/gzh/tomcat11 : -Dcatalina.home=/test28/gzh/tomcat12 
> : -Djava.io.tmpdir=/test28/gzh/tomcat/temp13 : 
> org.apache.catalina.startup.Bootstrap14 : startSince they are not passing and 
> -Xmx values we are taking defaults which look at the system resources. So 
> what is happening here is a 32 bit word aligned address is being used to 
> index into a byte array (gdb) jo 0x9ffffffe1d0d2bb8_mark = 
> 0x0000000000000001, _klass = 0x9fffffffa8c00768, instance of type [Blength of 
> the array: 1180 0 0 102 0 0 0 8 0 70 103 122 104 103 108 120 116 58 70 83 78 
> 95 50 48 49 53 49 48 50 50 44 65 44 49 52 52 53 52 55 57 57 51 51 57 53 56 46 
> 52 56 54 55 50 48 51 49 99 57 97 101 52 57 101 97 101 49 100 56 49 51 53 51 
> 99 99 97 97 54 98 56 100 46 4 105 110 102 111 115 101 113 110 117 109 68 117 
> 114 105 110 103 79 112 101 110 0 0 1 80 -6 96 -95 -48 4 0 0 0 0 0 0 0 4This 
> is the whole string gdb) x /2s 0x9ffffffe1d0d2bd80x9ffffffe1d0d2bd8:      
> ""0x9ffffffe1d0d2bd9:      
> "Fgzhglxt:FSN_20151022,A,1445479933958.48672031c9ae49eae1d81353ccaa6b8d.\004infoseqnumDuringOpen"To
>  me this is a bug in the callee potentially in 
> org.apache.hadoop.hbase.util.Bytes$LexicographicalComparerHolder$UnsafeComparer::compareToWhy
>  are they calling Unsafe_GetLong on a byte array,  there is no checking of 
> alignment and I really think this is a bug on their part. As far as I know, 
> GetLong expects 64 bit alignment I did find some other 64 bit users who saw 
> this with the same stack trace as this customer
> https://issues.apache.org/jira/browse/PHOENIX-1438http://permalink.gmane.org/gmane.comp.java.hadoop.hbase.devel/39017
> the fix would go here by adding a test for ia64 
> looking at the code from a bug they are checking for if the box is sparc.  
> static Comparer<byte[]> getBestComparer() {
> +      if (System.getProperty("os.arch").equals("sparc")) {  <====
> +        if (LOG.isTraceEnabled()) {
> +          LOG.trace("Lexicographical comparer selected for "
> +              + "byte aligned system architecture");
> +        }
> +        return lexicographicalComparerJavaImpl();
> +      }
>        try {
>          Class<?> theClass = Class.forName(UNSAFE_COMPARER_NAME);so this is 
> 'fixable' from a java class perspective.Hari said he will talk with his open 
> source contact 
> This Hadoop bug report points to the same problem in the same code:
> https://issues.apache.org/jira/browse/HADOOP-11466
> In that case the symptom of the unaligned accesses was bad performance 
> instead of a crash. This shows diffs for that fix:
> http://mail-archives.apache.org/mod_mbox/hadoop-common-commits/201501.mbox/%[email protected]%3E
> Those diffs show that fix only avoids the bad code when running on "sparc". 
> They really should have instead avoided that bad code for every architecture 
> other than x86. They should not be assuming that that FastByteComparisons 
> enhancement will work on other processors and actually improves performance. 
> On processors that do allow unaligned accesses at much cost they are just 
> creating bad performance that will be hard for anyone to ever find.
> For all IA64 customers this will be an issue when running 64 bit. The IA 
> processor enforces alignment on instruction types



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to