Re: [OT] IIS7/isapi/tomcat performance
Hi Chuck, You did not see my earlier response where I came to the same conclusion about the types after looking at some other sites including a wiki. Yes there was some confusion but now I am clear that it is compiler dependant as I said earlier. Thanks, -Tony - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] IIS7/isapi/tomcat performance
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony, On 3/1/2011 6:27 PM, Tony Anecito wrote: I believe the effect of compression is relative. In other words for a big program with lots of 64-bit pointers and 64-bit longs it is helps but for small programs it does not. A long in Java is always 64 bits. Those /will/ be faster on a 64-bit architecture. The only reason any of this is a problem is because pointers (somewhat) unexpectedly double in size when moving from a 32-bit to a 64-bit platform. If you were running fine in a 128MiB heap on a 32-bit machine, you may well have to increase your heap size on a 64-bit machine just to store the exact same set of objects. I would hope the full 64-bit data bus would be used. So you think 32-pins on the processor are not used when running a 32-bit process? It depends upon exactly what the processor id doing. Those chips with bundled x86 cores will use the x86 core (which is /only/ 32-bit, so there's no option for 64-bit operations). Those chips which have only x86-64 chips will either use 64 bits to manipulate 32-bit data (and effectively waste the 32 most significant bits) or wave their hands wildly and achieve some sort of miracle where 32-bit processes run twice as fast because of a wider word size. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk1u+RsACgkQ9CaO5/Lv0PCXegCfYWZr5Z8gOpHLH4g0FM3aJE5Z ovEAn02zREkR5mqq1wX4dagQAq9MvACz =v55r -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] IIS7/isapi/tomcat performance
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Chuck, On 3/1/2011 6:09 PM, Caldarale, Charles R wrote: From: Christopher Schultz [mailto:ch...@christopherschultz.net] Subject: Re: [OT] IIS7/isapi/tomcat performance I don't understand why communicating a 64-bit value over a 64-bit bus would take longer than communicating a 32-bit value over a 64-bit bus: Because you get *two* 32-bit values for one transfer, not just one. If, as you say, Intel can move 64 /bytes/ across a data path (if you prefer that phrase over the bus) then the word size really does make a difference, here. They should be getting 16 32-bit words across such a data path or 8 64-bit words. If the pointers are doubling in size, this makes 64-bit mode go slower because you get half the throughput when using word-sized values. Since pointers in general are word-sized, they always suffer while other (usually smaller) data does not. The key is that the data path(s) are actually much wider than the word size, which I didn't realize. I also get that some processors (like Itanium) have an x84 processor core on the die (Presumably, you meant x86.) Sorry, Itanium was notoriously bad at running 32-bit apps. I did mean x86. Lots of typing yesterday. The new Itaniums are supposed to be actually worth it, though. getting the data from point A to point B shouldn't matter Sure it does, if you can batch multiple operand accesses together (which current Intel cores do). I suppose of the CPU knew it was in a 32-bit mode, it could adjust the number of clock ticks it had to wait around for 32-bit data to go through an adder, but that seems overly complicated for a straightforward CPU task. Simple adders have only used one cycle for decades, regardless of the width. If the clock tick is long enough :) - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk1u/FUACgkQ9CaO5/Lv0PBBHACfQsXMTwCmZywZrihKJI3M0k5c BdoAn3VrrewxdTHZU0TZvR1pbQcKFwVj =1Png -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] IIS7/isapi/tomcat performance
Actually according to the IBM porting guide longs are different byte lengths depending upon what frame of reference they are speaking to. On page 4 of the following port guide: http://public.dhe.ibm.com/software/dw/jdk/64bitporting/64BitJavaPortingGuide.pdf It states:For Windows, on 32-bit systems, integers, longs and pointers are all 32-bits. On 64-bit systems, integers and longs remain 32-bits, but pointers become 64-bits and long longs are 64-bits.integers remain 32-bits and longs and pointers become 64-bits. I could have interpreted this wrong but from a OS standpoint native code this is what they said. Now if the byte code is transalated to native code (which it must be to run). This would explain why Windows might seem to run faster than Linux for 64-bit. Regarding bus usage I agree with Chuck's explanation about usage that the processors and I said so in a previous message. Regards, -Tony - Original Message From: Christopher Schultz ch...@christopherschultz.net To: Tomcat Users List users@tomcat.apache.org Sent: Wed, March 2, 2011 7:12:43 PM Subject: Re: [OT] IIS7/isapi/tomcat performance -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony, On 3/1/2011 6:27 PM, Tony Anecito wrote: I believe the effect of compression is relative. In other words for a big program with lots of 64-bit pointers and 64-bit longs it is helps but for small programs it does not. A long in Java is always 64 bits. Those /will/ be faster on a 64-bit architecture. The only reason any of this is a problem is because pointers (somewhat) unexpectedly double in size when moving from a 32-bit to a 64-bit platform. If you were running fine in a 128MiB heap on a 32-bit machine, you may well have to increase your heap size on a 64-bit machine just to store the exact same set of objects. I would hope the full 64-bit data bus would be used. So you think 32-pins on the processor are not used when running a 32-bit process? It depends upon exactly what the processor id doing. Those chips with bundled x86 cores will use the x86 core (which is /only/ 32-bit, so there's no option for 64-bit operations). Those chips which have only x86-64 chips will either use 64 bits to manipulate 32-bit data (and effectively waste the 32 most significant bits) or wave their hands wildly and achieve some sort of miracle where 32-bit processes run twice as fast because of a wider word size. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk1u+RsACgkQ9CaO5/Lv0PCXegCfYWZr5Z8gOpHLH4g0FM3aJE5Z ovEAn02zREkR5mqq1wX4dagQAq9MvACz =v55r -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org For AIX and Linux, on 32-bit systems, integers, longs and pointers are all 32-bits. On 64-bit systems, - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] IIS7/isapi/tomcat performance
On the wiki Java long is 64-bits not sure what a Long is. So IBM is thinking C,C++ a long is 32bits which is what the paper meant. So I as wrong. Regards, -Tony - Original Message From: Tony Anecito adanec...@yahoo.com To: Tomcat Users List users@tomcat.apache.org Sent: Wed, March 2, 2011 9:15:09 PM Subject: Re: [OT] IIS7/isapi/tomcat performance Actually according to the IBM porting guide longs are different byte lengths depending upon what frame of reference they are speaking to. On page 4 of the following port guide: http://public.dhe.ibm.com/software/dw/jdk/64bitporting/64BitJavaPortingGuide.pdf It states:For Windows, on 32-bit systems, integers, longs and pointers are all 32-bits. On 64-bit systems, integers and longs remain 32-bits, but pointers become 64-bits and long longs are 64-bits.integers remain 32-bits and longs and pointers become 64-bits. I could have interpreted this wrong but from a OS standpoint native code this is what they said. Now if the byte code is transalated to native code (which it must be to run). This would explain why Windows might seem to run faster than Linux for 64-bit. Regarding bus usage I agree with Chuck's explanation about usage that the processors and I said so in a previous message. Regards, -Tony - Original Message From: Christopher Schultz ch...@christopherschultz.net To: Tomcat Users List users@tomcat.apache.org Sent: Wed, March 2, 2011 7:12:43 PM Subject: Re: [OT] IIS7/isapi/tomcat performance -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony, On 3/1/2011 6:27 PM, Tony Anecito wrote: I believe the effect of compression is relative. In other words for a big program with lots of 64-bit pointers and 64-bit longs it is helps but for small programs it does not. A long in Java is always 64 bits. Those /will/ be faster on a 64-bit architecture. The only reason any of this is a problem is because pointers (somewhat) unexpectedly double in size when moving from a 32-bit to a 64-bit platform. If you were running fine in a 128MiB heap on a 32-bit machine, you may well have to increase your heap size on a 64-bit machine just to store the exact same set of objects. I would hope the full 64-bit data bus would be used. So you think 32-pins on the processor are not used when running a 32-bit process? It depends upon exactly what the processor id doing. Those chips with bundled x86 cores will use the x86 core (which is /only/ 32-bit, so there's no option for 64-bit operations). Those chips which have only x86-64 chips will either use 64 bits to manipulate 32-bit data (and effectively waste the 32 most significant bits) or wave their hands wildly and achieve some sort of miracle where 32-bit processes run twice as fast because of a wider word size. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk1u+RsACgkQ9CaO5/Lv0PCXegCfYWZr5Z8gOpHLH4g0FM3aJE5Z ovEAn02zREkR5mqq1wX4dagQAq9MvACz =v55r -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org For AIX and Linux, on 32-bit systems, integers, longs and pointers are all 32-bits. On 64-bit systems, - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: [OT] IIS7/isapi/tomcat performance
From: Tony Anecito [mailto:adanec...@yahoo.com] Subject: Re: [OT] IIS7/isapi/tomcat performance On page 4 of the following port guide: http://public.dhe.ibm.com/software/dw/jdk/64bitporting/64BitJavaPortingGuide.pdf It states:For Windows, on 32-bit systems, integers, longs and pointers are all 32-bits. On 64-bit systems, integers and longs remain 32-bits, but pointers become 64-bits and long longs are 64-bits. That ancient porting guide is misleading in several respects, one in particular being that the OS determines the size of language-specific types. That is incorrect; it's the *compiler* being used that makes that determination, not the platform or the OS. This would explain why Windows might seem to run faster than Linux for 64-bit. Sorry, that's completely false. As Chris pointed out, Java non-reference type sizes are fixed, and are completely independent of the platform the Java program is running on. The only thing that changes between a 32-bit JVM and a 64-bit one is the size of a reference (pointer). Even in the C and C++ code that makes up the core of the JVM, the programmers studiously avoid use of ambiguous C types such as int and long anywhere that it might make a difference, and instead use explicitly sized types. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] IIS7/isapi/tomcat performance
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony, On 2/28/2011 2:57 PM, Tony Anecito wrote: Since the memory pointers are larger you may need to increase your heap size but you can compress the address pointers. +1 Also, if you use JNI and it is 32-bit then you will have unexpected issues same thing with any native libs your try to use. +1 Generally it will be up to 20% slower due to the pointers. Can you explain that claim? Unless the OP is using compressed pointers (which will require a decode in order to dereference), why would the performance drop when using 64-bit pointers instead of 32-bit pointers. Presumably, the CPU has 64-bit (or bigger) registers and can handle 64-bit numbers just as fast as 32-bit numbers. Or do modern CPUs run in g a32-bit mode where the hardware doesn't bother to add-out to the 33+ bits? - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk1tEMcACgkQ9CaO5/Lv0PBa8ACgmRjggPsYHma8tShCNK2WfOJd Qv8AoJ0KGEVwKQRDfSvwAvoF2Is5oHoW =Anih -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] IIS7/isapi/tomcat performance
Hi Chris, The performance degregation for 64 bit versus 32 bit has been the subject of much concern in the java community. Here is the number I mentioned straight from Oracle itself: http://www.oracle.com/technetwork/java/hotspotfaq-138619.html What are the performance characteristics of 64-bit versus 32-bit VMs? Generally, the benefits of being able to address larger amounts of memory come with a small performance loss in 64-bit VMs versus running the same application on a 32-bit VM. This is due to the fact that every native pointer in the system takes up 8 bytes instead of 4. The loading of this extra data has an impact on memory usage which translates to slightly slower execution depending on how many pointers get loaded during the execution of your Java program. The good news is that with AMD64 and EM64T platforms running in 64-bit mode, the Java VM gets some additional registers which it can use to generate more efficient native instruction sequences. These extra registers increase performance to the point where there is often no performance loss at all when comparing 32 to 64-bit execution speed. The performance difference comparing an application running on a 64-bit platform versus a 32-bit platform on SPARC is on the order of 10-20% degradation when you move to a 64-bit VM. On AMD64 and EM64T platforms this difference ranges from 0-15% depending on the amount of pointer accessing your application performs. If you google using the keywords: java 64-bit vs 32-bit performance You will find alot of discussion about this. Regards, -Tony - Original Message From: Christopher Schultz ch...@christopherschultz.net To: Tomcat Users List users@tomcat.apache.org Sent: Tue, March 1, 2011 8:29:11 AM Subject: Re: [OT] IIS7/isapi/tomcat performance -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony, On 2/28/2011 2:57 PM, Tony Anecito wrote: Since the memory pointers are larger you may need to increase your heap size but you can compress the address pointers. +1 Also, if you use JNI and it is 32-bit then you will have unexpected issues same thing with any native libs your try to use. +1 Generally it will be up to 20% slower due to the pointers. Can you explain that claim? Unless the OP is using compressed pointers (which will require a decode in order to dereference), why would the performance drop when using 64-bit pointers instead of 32-bit pointers. Presumably, the CPU has 64-bit (or bigger) registers and can handle 64-bit numbers just as fast as 32-bit numbers. Or do modern CPUs run in g a32-bit mode where the hardware doesn't bother to add-out to the 33+ bits? - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk1tEMcACgkQ9CaO5/Lv0PBa8ACgmRjggPsYHma8tShCNK2WfOJd Qv8AoJ0KGEVwKQRDfSvwAvoF2Is5oHoW =Anih -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] IIS7/isapi/tomcat performance
Also, I have not programmed in assembly language or in hexadecimal for some time but I would hope that for a 32-bit java process running on a 64-bit processor I would fetch a 32-bit pointer and maybe a 32-bit long on a 64-bit data bus. Remember we are talking about pointers in code coming into the processor via the data bus. Interesting enough for AIX and Linux a long is 64bit for 64-bit java versus 32-bit for 64-bit windows. So it looks like for Linux it would be slower than windows. See: http://public.dhe.ibm.com/software/dw/jdk/64bitporting/64BitJavaPortingGuide.pdf Bottom line on how much worse things get is based upon how many pointers and longs are used for 64-bit java that are used. Regards, -Tony - Original Message From: Tony Anecito adanec...@yahoo.com To: Tomcat Users List users@tomcat.apache.org Sent: Tue, March 1, 2011 12:44:37 PM Subject: Re: [OT] IIS7/isapi/tomcat performance Hi Chris, The performance degregation for 64 bit versus 32 bit has been the subject of much concern in the java community. Here is the number I mentioned straight from Oracle itself: http://www.oracle.com/technetwork/java/hotspotfaq-138619.html What are the performance characteristics of 64-bit versus 32-bit VMs? Generally, the benefits of being able to address larger amounts of memory come with a small performance loss in 64-bit VMs versus running the same application on a 32-bit VM. This is due to the fact that every native pointer in the system takes up 8 bytes instead of 4. The loading of this extra data has an impact on memory usage which translates to slightly slower execution depending on how many pointers get loaded during the execution of your Java program. The good news is that with AMD64 and EM64T platforms running in 64-bit mode, the Java VM gets some additional registers which it can use to generate more efficient native instruction sequences. These extra registers increase performance to the point where there is often no performance loss at all when comparing 32 to 64-bit execution speed. The performance difference comparing an application running on a 64-bit platform versus a 32-bit platform on SPARC is on the order of 10-20% degradation when you move to a 64-bit VM. On AMD64 and EM64T platforms this difference ranges from 0-15% depending on the amount of pointer accessing your application performs. If you google using the keywords: java 64-bit vs 32-bit performance You will find alot of discussion about this. Regards, -Tony - Original Message From: Christopher Schultz ch...@christopherschultz.net To: Tomcat Users List users@tomcat.apache.org Sent: Tue, March 1, 2011 8:29:11 AM Subject: Re: [OT] IIS7/isapi/tomcat performance -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony, On 2/28/2011 2:57 PM, Tony Anecito wrote: Since the memory pointers are larger you may need to increase your heap size but you can compress the address pointers. +1 Also, if you use JNI and it is 32-bit then you will have unexpected issues same thing with any native libs your try to use. +1 Generally it will be up to 20% slower due to the pointers. Can you explain that claim? Unless the OP is using compressed pointers (which will require a decode in order to dereference), why would the performance drop when using 64-bit pointers instead of 32-bit pointers. Presumably, the CPU has 64-bit (or bigger) registers and can handle 64-bit numbers just as fast as 32-bit numbers. Or do modern CPUs run in g a32-bit mode where the hardware doesn't bother to add-out to the 33+ bits? - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk1tEMcACgkQ9CaO5/Lv0PBa8ACgmRjggPsYHma8tShCNK2WfOJd Qv8AoJ0KGEVwKQRDfSvwAvoF2Is5oHoW =Anih -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] IIS7/isapi/tomcat performance
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony, On 3/1/2011 3:24 PM, Tony Anecito wrote: Also, I have not programmed in assembly language or in hexadecimal for some time but I would hope that for a 32-bit java process running on a 64-bit processor I would fetch a 32-bit pointer and maybe a 32-bit long on a 64-bit data bus. Remember we are talking about pointers in code coming into the processor via the data bus. The bus on a 64-bit architecture has better be at least 64-bits wide, otherwise nothing works right. They used to run 64-bit OSs on 32-bit hardware and everything took twice as long because the bus was only 32-bit and so every piece of (64-bit) data took double the time to transmit. Booting 64-bit WinNT would take a looong time. I don't understand why communicating a 64-bit value over a 64-bit bus would take longer than communicating a 32-bit value over a 64-bit bus: the clock speed of the bus is the same... the only difference between the two scenarios is that the user doesn't care about the upper 32-bits of data. The only thing that makes sense to me intuitively at this point (I'm still reading) is that using compressed object pointers slows things down. Interesting enough for AIX and Linux a long is 64bit for 64-bit java versus 32-bit for 64-bit windows. So it looks like for Linux it would be slower than windows. See: http://public.dhe.ibm.com/software/dw/jdk/64bitporting/64BitJavaPortingGuide.pdf That's interesting, though it doesn't specify what compiler is being used. The only thing that makes a long value 32-bit or 64-bit is the compiler compiling the code where the word long is present. Java fixes the size of all native data types, so a Java long is always 64-bits regardless of the architecture. ISO C declares that long is at least 32-bit, short is at least 16-bit, and plain-old int is somewhere in between whatever short and long turn out to be. That document seems to imply that the OS decides what the type widths are, and that only matters when interfacing with OS calls: if you call brk() and it expects a 64-bit value, if you provide a 32-bit one, bad things will happen. Bottom line on how much worse things get is based upon how many pointers and longs are used for 64-bit java that are used. I still don't get why moving 64-bit values around is slower than moving 32-bit values around: the bus is 64-bits no matter what mode you're in. I *do* get that compressed pointers slow things down. I *do* get that the heap will grow somewhere approaching twice the size as in a 32-bit JVM. I also get that some processors (like Itanium) have an x84 processor core on the die, so that processor can avoid (uselessly) performing 64-bit operations on 32-bit data, but getting the data from point A to point B shouldn't matter. Also, performing 64-bit operations on 32-bit data should take just as long as performing 64-bit operations on 64-bit data: the ALU goes as fast as it's designed to go. I suppose of the CPU knew it was in a 32-bit mode, it could adjust the number of clock ticks it had to wait around for 32-bit data to go through an adder, but that seems overly complicated for a straightforward CPU task. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk1tXrsACgkQ9CaO5/Lv0PBJfgCfXoAqt/K8TzqGk5AYO2+g4n7J OsMAoIbJ1nRUFVDilUDdkQTTOrRoMNWb =d3UM -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] IIS7/isapi/tomcat performance
Hi Chris, I guess you have not read my last email yet. I think of it as putting two 32-bit pieces of info on a 64-bit data bus whereas for two 64-bit pieces of information it takes two fetches or twice as long on the same hardware. Depending upon the number of bytes for each data type for 32-bit versus 64-bit 20% performance reduction makes sense. As for compressing the pointers all I read is it improves response time so that maybe running on 64-bit java the program is only 1% slower. I am assuming the pointers are compressed after the first pass or even before the byte code is run. Regards, -Tony - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] IIS7/isapi/tomcat performance
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony, On 3/1/2011 4:19 PM, Tony Anecito wrote: I guess you have not read my last email yet. I think of it as putting two 32-bit pieces of info on a 64-bit data bus whereas for two 64-bit pieces of information it takes two fetches or twice as long on the same hardware. Are you saying that a 32-bit JVM running on a 64-bit machine somehow utilizes the 64-bit bus? Malarkey. Perhaps the CPU as part of its instruction re-ordering can do this, but I seriously doubt that a 32-bit process on a 64-bit CPU gains a performance boost over that same 32-bit process running on a 32-bit CPU (which is what the above would imply). As for compressing the pointers all I read is it improves response time I can't believe that for a second. It actually slows things down. The only reason to compress pointers is so that your heap size doesn't roughly double when switching to 64-bit. The problem is that while the transition from 32-bit to 64-bit architecture now allows many orders of magnitude more memory to be accessed by each process (this is especially important for Java heaps), the amount of memory installed in servers has not really changed. 5 years ago, it wasn't uncommon for a 32-bit server to have 32GiB of memory. These days, a similar 64-bit server might still only have 32GiB of memory. so that maybe running on 64-bit java the program is only 1% slower. I am assuming the pointers are compressed after the first pass or even before the byte code is run. The pointers are compressed as the objects (really the references to them) are created. The problem is that they must be uncompressed for every dereference. It has nothing to do with the bytecode. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk1tbJUACgkQ9CaO5/Lv0PBZ3ACgrVFqcPNcIe+P3U1HW3QzRXpS L3oAnj82GTkXoQcOwxYskRLXWwsrFTcn =w2cy -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: [OT] IIS7/isapi/tomcat performance
From: Christopher Schultz [mailto:ch...@christopherschultz.net] Subject: Re: [OT] IIS7/isapi/tomcat performance Are you saying that a 32-bit JVM running on a 64-bit machine somehow utilizes the 64-bit bus? Malarkey. I wouldn't bet on that. Intel goes to great pains to insure all of the buses are fully utilized. On a 64-bit machine, all of the data paths from RAM up to the L1 operand cache will be able to move twice the number of items per cycle when the items are only 32 bits wide. Between the L1 cache and the superscalar execution core, there may be less of a gain, but since the core contains three ALUs and separate load and store sections to service them, memory operations are combined wherever possible to get data in and out as fast as possible. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
RE: [OT] IIS7/isapi/tomcat performance
From: Christopher Schultz [mailto:ch...@christopherschultz.net] Subject: Re: [OT] IIS7/isapi/tomcat performance I don't understand why communicating a 64-bit value over a 64-bit bus would take longer than communicating a 32-bit value over a 64-bit bus: Because you get *two* 32-bit values for one transfer, not just one. BTW, it's somewhat pointless to use the unqualified term bus when referring to modern CPU architecture. Now that Intel has finally figured out how to make multi-processor systems run at a reasonable speed by using techniques we implemented back in the 1960s, along with the advent of multiple memory cache levels, there's no longer a single bus to be concerned with. Most of them are wider than 64 bits in order to move as much data as possible; even ten years ago, Intel was moving 64 _bytes_ at a time on most of the data paths. I also get that some processors (like Itanium) have an x84 processor core on the die (Presumably, you meant x86.) Sorry, Itanium was notoriously bad at running 32-bit apps. getting the data from point A to point B shouldn't matter Sure it does, if you can batch multiple operand accesses together (which current Intel cores do). I suppose of the CPU knew it was in a 32-bit mode, it could adjust the number of clock ticks it had to wait around for 32-bit data to go through an adder, but that seems overly complicated for a straightforward CPU task. Simple adders have only used one cycle for decades, regardless of the width. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
Re: [OT] IIS7/isapi/tomcat performance
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Chuck, On 3/1/2011 5:42 PM, Caldarale, Charles R wrote: From: Christopher Schultz [mailto:ch...@christopherschultz.net] Subject: Re: [OT] IIS7/isapi/tomcat performance Are you saying that a 32-bit JVM running on a 64-bit machine somehow utilizes the 64-bit bus? Malarkey. I wouldn't bet on that. Intel goes to great pains to insure all of the buses are fully utilized. On a 64-bit machine, all of the data paths from RAM up to the L1 operand cache will be able to move twice the number of items per cycle when the items are only 32 bits wide. The question I have is how does the bus controller know that there are multiple 32-bit values coming down the line, and that it can send them simultaneously down the bus? There's more data to be sent over the bus than just pointers to other pieces of data. You have to move the instruction itself, etc. so there's lots of opportunities for other data to get in the way of this DRR-style data transfer across the bus. Between the L1 cache and the superscalar execution core, there may be less of a gain, but since the core contains three ALUs and separate load and store sections to service them, memory operations are combined wherever possible to get data in and out as fast as possible. I buy this argument, but that would only affect the processing of, say, a 64-bit pointer within the core... not the speed of passing that pointer around the rest of the machine. As you say, probably less of a gain. I'd love to see some real documentation and/or testing on this type of stuff. I certainly am somewhat naïve when it comes to details this low, but my intuition tells me that the CPU and bus aren't magic :) - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk1tfa4ACgkQ9CaO5/Lv0PBxlQCgjvY/NcigAvD/jXIWfckKUbju tUgAn2bfMa3iEuQeUe0j2ZqmgVxGn+dx =Vubd -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] IIS7/isapi/tomcat performance
I believe the effect of compression is relative. In other words for a big program with lots of 64-bit pointers and 64-bit longs it is helps but for small programs it does not. I would hope the full 64-bit data bus would be used. So you thing 32-pins on the processor are not used when running a 32-bit process? I am not saying you are not correct but I will check into it since I am curious and let you know what I find. I have never mentioned byte code as pointers all my referneces are to executable code or what the processor actually runs. Regards, -Tony - Original Message From: Christopher Schultz ch...@christopherschultz.net To: Tomcat Users List users@tomcat.apache.org Sent: Tue, March 1, 2011 3:00:53 PM Subject: Re: [OT] IIS7/isapi/tomcat performance -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony, On 3/1/2011 4:19 PM, Tony Anecito wrote: I guess you have not read my last email yet. I think of it as putting two 32-bit pieces of info on a 64-bit data bus whereas for two 64-bit pieces of information it takes two fetches or twice as long on the same hardware. Are you saying that a 32-bit JVM running on a 64-bit machine somehow utilizes the 64-bit bus? Malarkey. Perhaps the CPU as part of its instruction re-ordering can do this, but I seriously doubt that a 32-bit process on a 64-bit CPU gains a performance boost over that same 32-bit process running on a 32-bit CPU (which is what the above would imply). As for compressing the pointers all I read is it improves response time I can't believe that for a second. It actually slows things down. The only reason to compress pointers is so that your heap size doesn't roughly double when switching to 64-bit. The problem is that while the transition from 32-bit to 64-bit architecture now allows many orders of magnitude more memory to be accessed by each process (this is especially important for Java heaps), the amount of memory installed in servers has not really changed. 5 years ago, it wasn't uncommon for a 32-bit server to have 32GiB of memory. These days, a similar 64-bit server might still only have 32GiB of memory. so that maybe running on 64-bit java the program is only 1% slower. I am assuming the pointers are compressed after the first pass or even before the byte code is run. The pointers are compressed as the objects (really the references to them) are created. The problem is that they must be uncompressed for every dereference. It has nothing to do with the bytecode. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk1tbJUACgkQ9CaO5/Lv0PBZ3ACgrVFqcPNcIe+P3U1HW3QzRXpS L3oAnj82GTkXoQcOwxYskRLXWwsrFTcn =w2cy -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] IIS7/isapi/tomcat performance
Thanks Chuck I agree. I used to design hardware back in the 80-mid 90's so understand what you are saying but have not kept up with actual designs since then. I jumped over to software after that. I know I simplify some things but hope I still am correct. Feel free to correct me I will try to not get emotional about it but I do miss my 8080 and Z80. -Tony - Original Message From: Caldarale, Charles R chuck.caldar...@unisys.com To: Tomcat Users List users@tomcat.apache.org Sent: Tue, March 1, 2011 4:09:10 PM Subject: RE: [OT] IIS7/isapi/tomcat performance From: Christopher Schultz [mailto:ch...@christopherschultz.net] Subject: Re: [OT] IIS7/isapi/tomcat performance I don't understand why communicating a 64-bit value over a 64-bit bus would take longer than communicating a 32-bit value over a 64-bit bus: Because you get *two* 32-bit values for one transfer, not just one. BTW, it's somewhat pointless to use the unqualified term bus when referring to modern CPU architecture. Now that Intel has finally figured out how to make multi-processor systems run at a reasonable speed by using techniques we implemented back in the 1960s, along with the advent of multiple memory cache levels, there's no longer a single bus to be concerned with. Most of them are wider than 64 bits in order to move as much data as possible; even ten years ago, Intel was moving 64 _bytes_ at a time on most of the data paths. I also get that some processors (like Itanium) have an x84 processor core on the die (Presumably, you meant x86.) Sorry, Itanium was notoriously bad at running 32-bit apps. getting the data from point A to point B shouldn't matter Sure it does, if you can batch multiple operand accesses together (which current Intel cores do). I suppose of the CPU knew it was in a 32-bit mode, it could adjust the number of clock ticks it had to wait around for 32-bit data to go through an adder, but that seems overly complicated for a straightforward CPU task. Simple adders have only used one cycle for decades, regardless of the width. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: [OT] IIS7/isapi/tomcat performance
From: Christopher Schultz [mailto:ch...@christopherschultz.net] Subject: Re: [OT] IIS7/isapi/tomcat performance The question I have is how does the bus controller know that there are multiple 32-bit values coming down the line, and that it can send them simultaneously down the bus? A traditional bus controller hasn't been used in quite some time, and buses themselves are rapidly being replaced by point-to-point connections (finally), at least in terms of CPUs accessing memory. The interface between the L1 operand cache and the multiple ALUs is under control of a scheduler that's aware of the possible 72 simultaneous loads and stores going on, so it can combine accesses as it sees fit. Accesses between lower-level caches and actual RAM have always been wider than the data path within a core. There's more data to be sent over the bus than just pointers to other pieces of data. Of course - except there is no the bus. You have to move the instruction itself Not these days. The instruction will be loaded from memory once, broken (and combined) into micro-ops, and those are stored in the instruction cache. If you're getting i-cache much beyond single digit percentages, your performance will be horrible. so there's lots of opportunities for other data to get in the way of this DRR-style data transfer across the bus. Your continued use of the phrase the bus is rather quaint... that would only affect the processing of, say, a 64-bit pointer within the core... No, it affects all data, not just pointers. I'd love to see some real documentation and/or testing on this type of stuff. http://www.intel.com/products/processor/manuals/ Start with this one: http://www.intel.com/Assets/PDF/manual/253665.pdf my intuition tells me that the CPU and bus aren't magic :) Compared to just a few years ago, they are. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.