[jira] [Commented] (HBASE-14380) Correct data also getting skipped along with bad data in importTsv bulk load thru TsvImporterTextMapper

Bhupendra Kumar Jain (JIRA) Tue, 08 Sep 2015 05:31:56 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-14380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734734#comment-14734734
 ]


Bhupendra Kumar Jain commented on HBASE-14380:
----------------------------------------------

TextSortReducer will receive the request grouped by rowkey and all text lines 
as Iterable values. 
{code}
protected void reduce(ImmutableBytesWritable rowKey, java.lang.Iterable<Text> 
lines,
      Reducer<ImmutableBytesWritable, Text,ImmutableBytesWritable, 
KeyValue>.Context context)
      throws java.io.IOException, InterruptedException
{code}
Inside method, each line is parsed and in case of bad line, the method returns 
from there , instead of continuing with next line. So all subsequent data are 
getting ignored. 

{code}
catch (ImportTsv.TsvParser.BadTsvLineException badLine) {
          if (skipBadLines) {
            System.err.println("Bad line." + badLine.getMessage());
            incrementBadLineCount(1);
            return;
          }
{code}

> Correct data also getting skipped along with bad data in importTsv bulk load 
> thru TsvImporterTextMapper
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-14380
>                 URL: https://issues.apache.org/jira/browse/HBASE-14380
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Bhupendra Kumar Jain
>            Assignee: Bhupendra
>
> Cosider the input data is as below 
> ROWKEY, TIEMSTAMP, Col_Value
> r1,1,v1       >> Correct line
> r1             >> Bad line
> r1,3,v3       >> Correct line
> r1,4,v4       >> Correct line
> When data is bulk loaded using importTsv with mapper as TsvImporterTextMapper 
> ,  All the lines are getting ignored even though skipBadLines is set to true. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14380) Correct data also getting skipped along with bad data in importTsv bulk load thru TsvImporterTextMapper

Reply via email to