[
https://issues.apache.org/jira/browse/AVRO-764?page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel\#worklog-{worklog.getId()}
]
Harsh J Chouraria logged work on AVRO-764:
------------------------------------------
Author: Harsh J Chouraria
Created on: 15/Feb/11 3:50 AM
Start Date: 15/Feb/11 3:50 AM
Worklog Time Spent: 5m
Issue Time Tracking
-------------------
Worklog Id: (was: 11263)
Time Spent: 5m
Remaining Estimate: 0h
> Possible issue with BinaryData.compare(...) used in Map/Reduce
> --------------------------------------------------------------
>
> Key: AVRO-764
> URL: https://issues.apache.org/jira/browse/AVRO-764
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.4.1
> Environment: Linux, CDH3
> Reporter: Harsh J Chouraria
> Assignee: Harsh J Chouraria
> Priority: Critical
> Labels: hadoop
> Fix For: 1.5.0
>
> Attachments: avro.mapred.binarydata.compare.r1.diff,
> avro.mapred.binarydata.compare.r2.diff
>
> Time Spent: 5m
> Remaining Estimate: 0h
>
> MapReduce's RawComparator feature has a call {{compare(b1, start1, length1,
> b2, start2, length2)}} which is handled for Avro using
> {{BinaryData.compare(b1, start1, b2, start2, schema)}}
> BinaryDecoder, used by the BinaryData.compare(b1, start1, b2, start2, schema)
> utility, has a sub-clause that handles byte array inputs of less than 16
> bytes in length.
> This is the exact code which am talking about:
> {code:title=BinaryDecoder.java Lines 879-893|borderStyle=solid}
> private ByteArrayByteSource(byte[] data, int start, int len) {
> super();
> // make sure data is not too small, otherwise getLong may try and
> // read 10 bytes and get index out of bounds.
> if (data.length < 16 || len < 16) {
> this.data = new byte[16];
> System.arraycopy(data, start, this.data, 0, len);
> this.position = 0;
> this.max = len;
> } else {
> // use the array passed in
> this.data = data;
> this.position = start;
> this.max = start + len;
> }
> }
> {code}
> This clause would fail during {{arrayCopy}} since the target {{len}} bytes is
> sent into the constructor as the {{length}} of the byte array input itself,
> and not the length as given by the framework which should be the right one
> [Since the BD.compare() call does not take it as a parameter]. Thus,
> arrayCopy would be trying to copy from {{start}} byte index to {{length}}
> bytes after it, where {{length}} is the byte array's {{.length}} itself --
> which leads to an index bounds exception.
> Here is a slightly low level test case that should fail with an
> {{ArrayIndexOutOfBoundsException}} due to a bad {{System.arrayCopy}} call:
> {code:title=TestBinaryData.java|borderStyle=solid}
> package org.apache.avro.io;
> import java.io.ByteArrayOutputStream;
> import java.io.IOException;
> import java.util.ArrayList;
> import java.util.List;
> import org.apache.avro.Schema;
> import org.apache.avro.Schema.Field;
> import org.apache.avro.Schema.Type;
> import org.apache.avro.generic.GenericData.Record;
> import org.apache.avro.generic.GenericDatumWriter;
> import org.junit.Assert;
> import org.junit.Test;
> public class TestBinaryData {
> @Test
> public void testCompare() {
> // Prepare a schema for testing.
> Field integerField = new Field("test", Schema.create(Type.INT), null,
> null);
> List<Field> fields = new ArrayList<Field>();
> fields.add(integerField);
> Schema record = Schema.createRecord("test", null, null, false);
> record.setFields(fields);
>
> ByteArrayOutputStream b1 = new ByteArrayOutputStream(5);
> ByteArrayOutputStream b2 = new ByteArrayOutputStream(5);
> BinaryEncoder b1Enc = new BinaryEncoder(b1);
> BinaryEncoder b2Enc = new BinaryEncoder(b2);
> Record testDatum1 = new Record(record);
> testDatum1.put(0, 1);
> Record testDatum2 = new Record(record);
> testDatum2.put(0, 2);
> GenericDatumWriter<Record> gWriter = new
> GenericDatumWriter<Record>(record);
> Integer start1 = 0, start2 = 0;
> try {
> gWriter.write(testDatum1, b1Enc);
> b1Enc.flush();
> start1 = b1.size();
> gWriter.write(testDatum1, b1Enc);
> b1Enc.flush();
> b1.close();
> gWriter.write(testDatum2, b2Enc);
> b2Enc.flush();
> start2 = b2.size();
> gWriter.write(testDatum2, b2Enc);
> b2Enc.flush();
> b2.close();
> BinaryData.compare(b1.toByteArray(), start1, b2.toByteArray(), start2,
> record);
> } catch (IOException e) {
> Assert.fail("IOException while writing records to output stream.");
> }
> }
> }
> {code}
> A solution would be to let the length, as given by the MapReduce framework
> itself, be used instead of using {{bytearray.length}} in its place. I'll
> attach a patch for this soon.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira