[
https://issues.apache.org/jira/browse/HBASE-14520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14943461#comment-14943461
]
Ted Yu commented on HBASE-14520:
--------------------------------
lgtm
> Optimize the number of calls for tags creation in bulk load
> -----------------------------------------------------------
>
> Key: HBASE-14520
> URL: https://issues.apache.org/jira/browse/HBASE-14520
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 2.0.0
> Reporter: Bhupendra Kumar Jain
> Assignee: Bhupendra Kumar Jain
> Fix For: 2.0.0
>
> Attachments: HBASE-14520.patch
>
>
> At present, ttl and Visibility expr is one per tsv line i.e. the values and
> the tags remain same for all the columns present in that line. As per the
> code, List of tags are created for each cell, Instead of creating new tags
> for each cell, tags created once for the line can be reused by other cells.
> Assume 1Million rows and 1000 columns. Currently tags creation will happen
> for 1M * 1000 times. If reuse the tags, the tags creation can reduce to 1M
> times. (i.e. one per tsv line).
> This is applicable in both TsvImporterMapper and TextSortReducer logic.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)