On 11/10/08 1:30 AM, "Aaron Kimball" <[EMAIL PROTECTED]> wrote: > It sounds like you think the 64- and 32-bit environments are effectively > interchangable. May I ask why are you using both? The 64bit environment > gives you access to more memory; do you see faster performance for the TT's > in 32-bit mode? Do you get bit by library compatibility bugs that others > should watch out for in running a dual-mode Hadoop environment?
Some random thoughts on our mixed environment: A) The vast majority of user provided (legacy) code is 32-bit. Since you can't mix 64 and 32 bit objects at link or runtime, it just makes sense for us to run TTs, etc, by default as 32-bit to give us the most bang for our buck. B) In the case of the data node, the memory usage is small enough that the 64-bit JVM isn't needed. C) Since we currently run HOD, it should be possible for users to switch their bit-ness and I think we have a handful of users that do. We'll probably lose this capability when we go back to a static job tracker. :( D) For streaming jobs, the bit-ness of the JVM is irrelevant. 32-bit is better due to the smaller footprint since streaming jobs eat memory like it was candy. :) E) We load the 64-bit and 32-bit versions of libraries on our nodes, thus allowing us to move our bit-ness whenever we like. This makes for a fat image (so no RAM disk for the OS for us!), but given the streaming VM issues, it works out mostly in our favor anyway. In general, 64-bit code runs slower than 32-bit code. So unless one needs to access more memory or has external dependencies (JNIs, whatever), 32-bit for your Java environment is the way to go. The name node and maybe a static job tracker are the potential problem children here and places where I suspect most people will be using the 64-bit JVM.