[jira] [Commented] (HBASE-14267) In Mapreduce on HBase scenario, restart in TableInputFormat will result in getting wrong data.

Qianxi Zhang (JIRA) Thu, 20 Aug 2015 02:24:04 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704572#comment-14704572
 ]


Qianxi Zhang commented on HBASE-14267:
--------------------------------------

IMO, this problem is serious and covert. User should know the data can not be 
modified.
In our use case, this bug is covert. When lease expired, it will occur, 
otherwise it will not.
 
There is three method to solve this problem:
1. not allowed to modify the row in Result.getRow();
2. return the clone data;
3. API doc for that to let user know the data should be modified.


> In Mapreduce on HBase scenario, restart in TableInputFormat will result in 
> getting wrong data.
> ----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-14267
>                 URL: https://issues.apache.org/jira/browse/HBASE-14267
>             Project: HBase
>          Issue Type: Bug
>          Components: Client, mapreduce
>            Reporter: Qianxi Zhang
>            Assignee: Qianxi Zhang
>         Attachments: HBASE_14267_trunk_v1.patch
>
>
> When I run a mapreduce job on HBase, I will modify the row got from 
> Result.getRow(), for example, reverse the row. Since my program is very 
> complicated to handle data, it takes long time, and the lease int Region 
> server expired. 
> Result#195
> {code}
>   public byte [] getRow() {
>     if (this.row == null) {
>       this.row = (this.cells == null || this.cells.length == 0) ?
>           null :
>           CellUtil.cloneRow(this.cells[0]);
>     }
>     return this.row;
>   }
> {code}
> TableInputFormat will restart the scan from last row, but the row has been 
> modified, so it will read wrong data.
> TableRecordReaderImpl#218
> {code}
>       } catch (IOException e) {
>         // do not retry if the exception tells us not to do so
>         if (e instanceof DoNotRetryIOException) {
>           throw e;
>         }
>         // try to handle all other IOExceptions by restarting
>         // the scanner, if the second call fails, it will be rethrown
>         LOG.info("recovered from " + StringUtils.stringifyException(e));
>         if (lastSuccessfulRow == null) {
>           LOG.warn("We are restarting the first next() invocation," +
>               " if your mapper has restarted a few other times like this" +
>               " then you should consider killing this job and investigate" +
>               " why it's taking so long.");
>         }
>         if (lastSuccessfulRow == null) {
>           restart(scan.getStartRow());
>         } else {
>           restart(lastSuccessfulRow);
>           scanner.next();    // skip presumed already mapped row
>         }
>         value = scanner.next();
>         if (value != null && value.isStale()) numStale++;
>         numRestarts++;
>       }
>       if (value != null && value.size() > 0) {
>         key.set(value.getRow());
>         lastSuccessfulRow = key.get();
>         return true;
>       }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14267) In Mapreduce on HBase scenario, restart in TableInputFormat will result in getting wrong data.

Reply via email to