Bhupendra Kumar Jain created HBASE-14520:
--------------------------------------------
Summary: Optimnize the number of calls for tags creation in bulk
load
Key: HBASE-14520
URL: https://issues.apache.org/jira/browse/HBASE-14520
Project: HBase
Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Bhupendra Kumar Jain
Assignee: Bhupendra Kumar Jain
At present, ttl and Visibility expr is one per tsv line i.e. the values and the
tags remain same for all the columns present in that line. As per the code,
List of tags are created for each cell, Instead of creating new tags for each
cell, tags created once for the line can be reused by other cells.
Assume 1Million rows and 1000 columns. Currently tags creation will happen for
1M * 1000 times. If reuse the tags, the tags creation can reduce to 1M times.
(i.e. one per tsv line).
This is applicable in both TsvImporterMapper and TextSortReducer logic.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)