[jira] Created: (AVRO-764) Possible issue with BinaryData.compare(...) used in Map/Reduce

Harsh J Chouraria (JIRA) Sun, 13 Feb 2011 07:38:23 -0800

Possible issue with BinaryData.compare(...) used in Map/Reduce
--------------------------------------------------------------


                 Key: AVRO-764
                 URL: https://issues.apache.org/jira/browse/AVRO-764
             Project: Avro
          Issue Type: Bug
          Components: doc
    Affects Versions: 1.4.1
         Environment: Linux, CDH3
            Reporter: Harsh J Chouraria
            Assignee: Harsh J Chouraria
            Priority: Critical
             Fix For: 1.5.0


MapReduce's RawComparator feature has a call {{compare(b1, start1, length1, b2, 
start2, length2)}} which is handled for Avro using {{BinaryData.compare(b1, 
start1, b2, start2, schema)}}

BinaryDecoder, used by the BinaryData.compare(b1, start1, b2, start2, schema) 
utility, has a sub-clause that handles byte array inputs of less than 16 bytes 
in length.

This is the exact code which am talking about:
{code:title=BinaryDecoder.java Lines 879-893|borderStyle=solid}
    private ByteArrayByteSource(byte[] data, int start, int len) {
      super();
      // make sure data is not too small, otherwise getLong may try and
      // read 10 bytes and get index out of bounds.
      if (data.length < 16 || len < 16) {
        this.data = new byte[16];
        System.arraycopy(data, start, this.data, 0, len);
        this.position = 0;
        this.max = len;
      } else {
        // use the array passed in
        this.data = data;
        this.position = start;
        this.max = start + len;
      }
    }
{code}

This clause would fail during {{arrayCopy}} since the target {{len}} bytes is 
sent into the constructor as the {{length}} of the byte array input itself, and 
not the length as given by the framework which should be the right one [Since 
the BD.compare() call does not take it as a parameter]. Thus, arrayCopy would 
be trying to copy from {{start}} byte index to {{length}} bytes after it, where 
{{length}} is the byte array's {{.length}} itself -- which leads to an index 
bounds exception.

Here is a slightly low level test case that should fail with an 
{{ArrayIndexOutOfBoundsException}} due to a bad {{System.arrayCopy}} call:
{code:title=TestBinaryData.java|borderStyle=solid}
package org.apache.avro.io;

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.avro.Schema;
import org.apache.avro.Schema.Field;
import org.apache.avro.Schema.Type;
import org.apache.avro.generic.GenericData.Record;
import org.apache.avro.generic.GenericDatumWriter;
import org.junit.Assert;
import org.junit.Test;

public class TestBinaryData {

  @Test
  public void testCompare() {
    // Prepare a schema for testing.
    Field integerField = new Field("test", Schema.create(Type.INT), null, null);
    List<Field> fields = new ArrayList<Field>();
    fields.add(integerField);
    Schema record = Schema.createRecord("test", null, null, false);
    record.setFields(fields);
    
    ByteArrayOutputStream b1 = new ByteArrayOutputStream(5);
    ByteArrayOutputStream b2 = new ByteArrayOutputStream(5);
    BinaryEncoder b1Enc = new BinaryEncoder(b1);
    BinaryEncoder b2Enc = new BinaryEncoder(b2);
    Record testDatum1 = new Record(record);
    testDatum1.put(0, 1);
    Record testDatum2 = new Record(record);
    testDatum2.put(0, 2);
    GenericDatumWriter<Record> gWriter = new GenericDatumWriter<Record>(record);
    Integer start1 = 0, start2 = 0;
    try {
      gWriter.write(testDatum1, b1Enc);
      b1Enc.flush();
      start1 = b1.size();
      gWriter.write(testDatum1, b1Enc);
      b1Enc.flush();
      b1.close();
      gWriter.write(testDatum2, b2Enc);
      b2Enc.flush();
      start2 = b2.size();
      gWriter.write(testDatum2, b2Enc);
      b2Enc.flush();
      b2.close();
      BinaryData.compare(b1.toByteArray(), start1, b2.toByteArray(), start2, 
record);
    } catch (IOException e) {
      Assert.fail("IOException while writing records to output stream.");
    }
  }
}
{code}

A solution would be to let the length, as given by the MapReduce framework 
itself, be used instead of using {{bytearray.length}} in its place. I'll attach 
a patch for this soon.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Created: (AVRO-764) Possible issue with BinaryData.compare(...) used in Map/Reduce

Reply via email to