HBASE-1765 broke MapReduce when using Result.list() ---------------------------------------------------
Key: HBASE-1856 URL: https://issues.apache.org/jira/browse/HBASE-1856 Project: Hadoop HBase Issue Type: Bug Affects Versions: 0.20.0 Reporter: Lars George Priority: Critical Fix For: 0.20.1 Not sure if it is just me, but using MR over HBase employing a TableReducer is not working. After the first row is read all subsequent rows get the same Result's of that very first row. After tracing this from the Map phase I found the culprit in Result and the HBASE-1765 delayed field parsing change. This is the code I use in the reduce(): {code} @Override protected void reduce(ImmutableBytesWritable key, Iterable<Result> values, Context context) throws IOException, InterruptedException { String skey = Bytes.toString(key.get()); context.getCounter(CountersTotals.ROWS).increment(1); for (Result result : values) { for (KeyValue kv: result.list()) { try { if (LOG.isDebugEnabled()) LOG.debug("reduce: key -> " + skey + ", kv -> " + kv); ... {code} Here is the current list() implementation: {code} public List<KeyValue> list() { if(this.kvs == null) { readFields(); } return isEmpty()? null: Arrays.asList(sorted()); } {code} The problem is that readFields(DataInput) does not clear kvs! {code} public void readFields(final DataInput in) throws IOException { familyMap = null; row = null; int totalBuffer = in.readInt(); if(totalBuffer == 0) { bytes = null; return; } byte [] raw = new byte[totalBuffer]; in.readFully(raw, 0, totalBuffer); bytes = new ImmutableBytesWritable(raw, 0, totalBuffer); } {code} The above is called by the MR framework's WritableSerialization for each map output. But since "kvs" is already set "list()" returns the old data! I assume the only change needed is clearing kvs as well: {code} public void readFields(final DataInput in) throws IOException { familyMap = null; row = null; kvs = null; .... {code} I'll test that now and report. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.