[ https://issues.apache.org/jira/browse/HBASE-14150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682059#comment-14682059 ]
Ted Malaska commented on HBASE-14150: ------------------------------------- Cool, just reviewed. I will try to get another patch in the next couple of days. > Add BulkLoad functionality to HBase-Spark Module > ------------------------------------------------ > > Key: HBASE-14150 > URL: https://issues.apache.org/jira/browse/HBASE-14150 > Project: HBase > Issue Type: New Feature > Components: spark > Reporter: Ted Malaska > Assignee: Ted Malaska > Fix For: 2.0.0 > > Attachments: HBASE-14150.1.patch, HBASE-14150.2.patch, > HBASE-14150.3.patch, HBASE-14150.4.patch > > > Add on to the work done in HBASE-13992 to add functionality to do a bulk load > from a given RDD. > This will do the following: > 1. figure out the number of regions and sort and partition the data correctly > to be written out to HFiles > 2. Also unlike the MR bulkload I would like that the columns to be sorted in > the shuffle stage and not in the memory of the reducer. This will allow this > design to support super wide records with out going out of memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)