[jira] [Updated] (HBASE-14150) Add BulkLoad functionality to HBase-Spark Module

Ted Malaska (JIRA) Tue, 11 Aug 2015 15:19:08 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-14150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ted Malaska updated HBASE-14150:
--------------------------------
    Attachment: HBASE-14150.5.patch

Changed a logInfo to logDebug as per Andrew's request.

As per Andrew other request to find a common place for code and strings used in 
both MR bulk load and Spark bulk load.  I would prefer that to be in another 
jira as Andrew said that was a possibility.  

I would like to talk with Andrew to figure out a good home for this common code 
and consts.

Thanks Andrew for the review

> Add BulkLoad functionality to HBase-Spark Module
> ------------------------------------------------
>
>                 Key: HBASE-14150
>                 URL: https://issues.apache.org/jira/browse/HBASE-14150
>             Project: HBase
>          Issue Type: New Feature
>          Components: spark
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>             Fix For: 2.0.0
>
>         Attachments: HBASE-14150.1.patch, HBASE-14150.2.patch, 
> HBASE-14150.3.patch, HBASE-14150.4.patch, HBASE-14150.5.patch
>
>
> Add on to the work done in HBASE-13992 to add functionality to do a bulk load 
> from a given RDD.
> This will do the following:
> 1. figure out the number of regions and sort and partition the data correctly 
> to be written out to HFiles
> 2. Also unlike the MR bulkload I would like that the columns to be sorted in 
> the shuffle stage and not in the memory of the reducer.  This will allow this 
> design to support super wide records with out going out of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-14150) Add BulkLoad functionality to HBase-Spark Module

Reply via email to