[
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562591#comment-14562591
]
Bhupendra Kumar Jain commented on HBASE-13702:
----------------------------------------------
What is the scope of dry-run functionality ?
As per current patch , in dry-run , same map task is getting executed. which
internally performs various operations such as ( Parsing text data, creating
PUT object, creating Cell object , tags etc. ) .. These operations will consume
some extra time and actually not required by dry-run functionality .. I think
Dry-run should finish very fast.
If dry-run scope is only to validate the parsing of data, then I think better
to have a new Map task for dry-run....
> ImportTsv: Add dry-run functionality and log bad rows
> -----------------------------------------------------
>
> Key: HBASE-13702
> URL: https://issues.apache.org/jira/browse/HBASE-13702
> Project: HBase
> Issue Type: New Feature
> Reporter: Apekshit Sharma
> Assignee: Apekshit Sharma
> Attachments: HBASE-13702.patch
>
>
> ImportTSV job skips bad records by default (keeps a count though).
> -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is
> encountered.
> To be easily able to determine which rows are corrupted in an input, rather
> than failing on one row at a time seems like a good feature to have.
> Moreover, there should be 'dry-run' functionality in such kinds of tools,
> which can essentially does a quick run of tool without making any changes but
> reporting any errors/warnings and success/failure.
> To identify corrupted rows, simply logging them should be enough. In worst
> case, all rows will be logged and size of logs will be same as input size,
> which seems fine. However, user might have to do some work figuring out where
> the logs. Is there some link we can show to the user when the tool starts
> which can help them with that?
> For the dry run, we can simply use if-else to skip over writing out KVs, and
> any other mutations, if present.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)