Ted Malaska created HBASE-14150:
-----------------------------------
Summary: Add BulkLoad functionality to HBase-Spark Module
Key: HBASE-14150
URL: https://issues.apache.org/jira/browse/HBASE-14150
Project: HBase
Issue Type: New Feature
Reporter: Ted Malaska
Assignee: Ted Malaska
Add on to the work done in HBASE-13992 to add functionality to do a bulk load
from a given RDD.
This will do the following:
1. figure out the number of regions and sort and partition the data correctly
to be written out to HFiles
2. Also unlike the MR bulkload I would like that the columns to be sorted in
the shuffle stage and not in the memory of the reducer. This will allow this
design to support super wide records with out going out of memory.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)