[
https://issues.apache.org/jira/browse/HBASE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705230#comment-14705230
]
Ted Yu edited comment on HBASE-14267 at 8/20/15 4:35 PM:
---------------------------------------------------------
I think what [~chenheng] proposed should be good.
was (Author: [email protected]):
I think this should be good.
> In Mapreduce on HBase scenario, restart in TableInputFormat will result in
> getting wrong data.
> ----------------------------------------------------------------------------------------------
>
> Key: HBASE-14267
> URL: https://issues.apache.org/jira/browse/HBASE-14267
> Project: HBase
> Issue Type: Bug
> Components: Client, mapreduce
> Reporter: Qianxi Zhang
> Assignee: Qianxi Zhang
> Attachments: HBASE_14267_trunk_v1.patch
>
>
> When I run a mapreduce job on HBase, I will modify the row got from
> Result.getRow(), for example, reverse the row. Since my program is very
> complicated to handle data, it takes long time, and the lease int Region
> server expired.
> Result#195
> {code}
> public byte [] getRow() {
> if (this.row == null) {
> this.row = (this.cells == null || this.cells.length == 0) ?
> null :
> CellUtil.cloneRow(this.cells[0]);
> }
> return this.row;
> }
> {code}
> TableInputFormat will restart the scan from last row, but the row has been
> modified, so it will read wrong data.
> TableRecordReaderImpl#218
> {code}
> } catch (IOException e) {
> // do not retry if the exception tells us not to do so
> if (e instanceof DoNotRetryIOException) {
> throw e;
> }
> // try to handle all other IOExceptions by restarting
> // the scanner, if the second call fails, it will be rethrown
> LOG.info("recovered from " + StringUtils.stringifyException(e));
> if (lastSuccessfulRow == null) {
> LOG.warn("We are restarting the first next() invocation," +
> " if your mapper has restarted a few other times like this" +
> " then you should consider killing this job and investigate" +
> " why it's taking so long.");
> }
> if (lastSuccessfulRow == null) {
> restart(scan.getStartRow());
> } else {
> restart(lastSuccessfulRow);
> scanner.next(); // skip presumed already mapped row
> }
> value = scanner.next();
> if (value != null && value.isStale()) numStale++;
> numRestarts++;
> }
> if (value != null && value.size() > 0) {
> key.set(value.getRow());
> lastSuccessfulRow = key.get();
> return true;
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)