[jira] [Created] (HBASE-14267) In Mapreduce on HBase scenario, restart in TableInputFormat will result in getting wrong data.

Qianxi Zhang (JIRA) Thu, 20 Aug 2015 01:15:13 -0700

Qianxi Zhang created HBASE-14267:
------------------------------------

             Summary: In Mapreduce on HBase scenario, restart in 
TableInputFormat will result in getting wrong data.
                 Key: HBASE-14267
                 URL: https://issues.apache.org/jira/browse/HBASE-14267
             Project: HBase
          Issue Type: Bug
          Components: Client, mapreduce
    Affects Versions: 1.1.0.1
            Reporter: Qianxi Zhang
            Assignee: Qianxi Zhang



When I run a mapreduce job on HBase, I will modify the row got from 
Result.getRow(), for example, reverse the row. Since my program is very 
complicated to handle data, it takes long time, and the lease int Region server 
expired. 
Result#195
{code}
  public byte [] getRow() {
    if (this.row == null) {
      this.row = (this.cells == null || this.cells.length == 0) ?
          null :
          CellUtil.cloneRow(this.cells[0]);
    }
    return this.row;
  }
{code}

TableInputFormat will restart the scan from last row, but the row has been 
modified, so it will read wrong data.
TableRecordReaderImpl#218
{code}
      } catch (IOException e) {
        // do not retry if the exception tells us not to do so
        if (e instanceof DoNotRetryIOException) {
          throw e;
        }
        // try to handle all other IOExceptions by restarting
        // the scanner, if the second call fails, it will be rethrown
        LOG.info("recovered from " + StringUtils.stringifyException(e));
        if (lastSuccessfulRow == null) {
          LOG.warn("We are restarting the first next() invocation," +
              " if your mapper has restarted a few other times like this" +
              " then you should consider killing this job and investigate" +
              " why it's taking so long.");
        }
        if (lastSuccessfulRow == null) {
          restart(scan.getStartRow());
        } else {
          restart(lastSuccessfulRow);
          scanner.next();    // skip presumed already mapped row
        }
        value = scanner.next();
        if (value != null && value.isStale()) numStale++;
        numRestarts++;
      }
      if (value != null && value.size() > 0) {
        key.set(value.getRow());
        lastSuccessfulRow = key.get();
        return true;
      }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-14267) In Mapreduce on HBase scenario, restart in TableInputFormat will result in getting wrong data.

Reply via email to