When hadoop is merging spill outputs, or merging map outputs in the
reducer, then i can see two byte arrays being used.

WIth regards to pass by reference vs value, you're right, the byte
arrays are passed 'by value', but the value passed is a copy of the
reference to the byte array (if that makes sense).

http://www.javaworld.com/javaworld/javaqa/2000-05/03-qa-0526-pass.html


On Sun, Apr 1, 2012 at 1:32 AM, Jane Wayne <jane.wayne2...@gmail.com> wrote:
> chris,
>
> 1. thanks, that approach to converting my custom key to byte[] works.
>
> 2. on the issue of pass by reference or pass by value, (it's been a while
> since i've visited this issue), i'm pretty sure java is pass by value
> (regardless of whether the parameters are primitives or objects). when i
> put the code into debugger, the ids of byte[] b1 and byte[] b2 are equal.
> if this is indeed the same byte array, why not just pass it as one
> parameter instead of two? unless in some cases, b1 and b2 are not the same.
> this second issue is not terribly too important, because the interface
> defines two byte arrays to be passed in, and so there's not much i (we) can
> do about it.
>
> thanks for the help!
>
> On Sat, Mar 31, 2012 at 5:18 PM, Chris White <chriswhite...@gmail.com>wrote:
>
>> You can serialize your Writables to a ByteArrayOutputStream and then
>> get it's underlying byte array:
>>
>> ByteArrayOutputStream baos = new ByteArrayOutputStream();
>> DataOutputStream dos = new DataOutputStream(baos);
>> Writable myWritable = new Text("text");
>> myWritable.write(dos);
>> byte[] bytes = baos.toByteArray();
>>
>> I would recommend writing a few bytes to the DataOutputStream first -
>> i always forget to respect the offset variables (s1 / s2), and this,
>> depending on how well you write your unit test, should allow you to
>> test that you are respecting them.
>>
>> The huge bytes arrays store the other Writables in the stream the are
>> about to be run by the comparator.
>>
>> Finally, arrays in java are objects, so you're passing a reference to
>> a byte array, not making a copy of the array.
>>
>> Chris
>>
>> On Sat, Mar 31, 2012 at 12:23 AM, Jane Wayne <jane.wayne2...@gmail.com>
>> wrote:
>> > i have a RawComparator that i would like to unit test (using mockito and
>> > mrunit testing packages). i want to test the method,
>> >
>> > public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2)
>> >
>> > how do i convert my custom key into a byte[] array? is there a util class
>> > to help me do this?
>> >
>> > also, when i put the code into the debugger, i notice that the byte[]
>> > arrays (b1 and b2) are HUGE (the lengths of each array are huge, in the
>> > thousands). what is actually in these byte[] arrays? intuitively, it does
>> > not seem like these byte[] arrays only represent my keys.
>> >
>> > lastly, why are such huge byte[] arrays being passed around? one would
>> > think that since Java is pass-by-value, there would be a large overhead
>> > with passing such large byte arrays around.
>> >
>> > your help is appreciated.
>>

Reply via email to