[
https://issues.apache.org/jira/browse/HBASE-14380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734734#comment-14734734
]
Bhupendra Kumar Jain commented on HBASE-14380:
----------------------------------------------
TextSortReducer will receive the request grouped by rowkey and all text lines
as Iterable values.
{code}
protected void reduce(ImmutableBytesWritable rowKey, java.lang.Iterable<Text>
lines,
Reducer<ImmutableBytesWritable, Text,ImmutableBytesWritable,
KeyValue>.Context context)
throws java.io.IOException, InterruptedException
{code}
Inside method, each line is parsed and in case of bad line, the method returns
from there , instead of continuing with next line. So all subsequent data are
getting ignored.
{code}
catch (ImportTsv.TsvParser.BadTsvLineException badLine) {
if (skipBadLines) {
System.err.println("Bad line." + badLine.getMessage());
incrementBadLineCount(1);
return;
}
{code}
> Correct data also getting skipped along with bad data in importTsv bulk load
> thru TsvImporterTextMapper
> -------------------------------------------------------------------------------------------------------
>
> Key: HBASE-14380
> URL: https://issues.apache.org/jira/browse/HBASE-14380
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.0.0
> Reporter: Bhupendra Kumar Jain
> Assignee: Bhupendra
>
> Cosider the input data is as below
> ROWKEY, TIEMSTAMP, Col_Value
> r1,1,v1 >> Correct line
> r1 >> Bad line
> r1,3,v3 >> Correct line
> r1,4,v4 >> Correct line
> When data is bulk loaded using importTsv with mapper as TsvImporterTextMapper
> , All the lines are getting ignored even though skipBadLines is set to true.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)