When hadoop is merging spill outputs, or merging map outputs in the reducer, then i can see two byte arrays being used.
WIth regards to pass by reference vs value, you're right, the byte arrays are passed 'by value', but the value passed is a copy of the reference to the byte array (if that makes sense). http://www.javaworld.com/javaworld/javaqa/2000-05/03-qa-0526-pass.html On Sun, Apr 1, 2012 at 1:32 AM, Jane Wayne <jane.wayne2...@gmail.com> wrote: > chris, > > 1. thanks, that approach to converting my custom key to byte[] works. > > 2. on the issue of pass by reference or pass by value, (it's been a while > since i've visited this issue), i'm pretty sure java is pass by value > (regardless of whether the parameters are primitives or objects). when i > put the code into debugger, the ids of byte[] b1 and byte[] b2 are equal. > if this is indeed the same byte array, why not just pass it as one > parameter instead of two? unless in some cases, b1 and b2 are not the same. > this second issue is not terribly too important, because the interface > defines two byte arrays to be passed in, and so there's not much i (we) can > do about it. > > thanks for the help! > > On Sat, Mar 31, 2012 at 5:18 PM, Chris White <chriswhite...@gmail.com>wrote: > >> You can serialize your Writables to a ByteArrayOutputStream and then >> get it's underlying byte array: >> >> ByteArrayOutputStream baos = new ByteArrayOutputStream(); >> DataOutputStream dos = new DataOutputStream(baos); >> Writable myWritable = new Text("text"); >> myWritable.write(dos); >> byte[] bytes = baos.toByteArray(); >> >> I would recommend writing a few bytes to the DataOutputStream first - >> i always forget to respect the offset variables (s1 / s2), and this, >> depending on how well you write your unit test, should allow you to >> test that you are respecting them. >> >> The huge bytes arrays store the other Writables in the stream the are >> about to be run by the comparator. >> >> Finally, arrays in java are objects, so you're passing a reference to >> a byte array, not making a copy of the array. >> >> Chris >> >> On Sat, Mar 31, 2012 at 12:23 AM, Jane Wayne <jane.wayne2...@gmail.com> >> wrote: >> > i have a RawComparator that i would like to unit test (using mockito and >> > mrunit testing packages). i want to test the method, >> > >> > public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) >> > >> > how do i convert my custom key into a byte[] array? is there a util class >> > to help me do this? >> > >> > also, when i put the code into the debugger, i notice that the byte[] >> > arrays (b1 and b2) are HUGE (the lengths of each array are huge, in the >> > thousands). what is actually in these byte[] arrays? intuitively, it does >> > not seem like these byte[] arrays only represent my keys. >> > >> > lastly, why are such huge byte[] arrays being passed around? one would >> > think that since Java is pass-by-value, there would be a large overhead >> > with passing such large byte arrays around. >> > >> > your help is appreciated. >>