[jira] [Updated] (HBASE-14150) Add BulkLoad functionality to HBase-Spark Module

Ted Malaska (JIRA) Thu, 06 Aug 2015 14:18:07 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-14150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ted Malaska updated HBASE-14150:
--------------------------------
    Attachment: HBASE-14150.3.patch

Added the following:
1. Partitioner is now in it's own class file
2. There are unit tests that just tests the partitioner
3. Added unit test for multi region bulk load
  3.1 Tested that the data got into HBase but also tested that the right number 
of HFiles get created
  3.2 Made sure that the partition works fine for the EMPTY_START_ROW rowKey
4. Fixed some spelling
5. Added Javadoc for some function parameters that I missed


> Add BulkLoad functionality to HBase-Spark Module
> ------------------------------------------------
>
>                 Key: HBASE-14150
>                 URL: https://issues.apache.org/jira/browse/HBASE-14150
>             Project: HBase
>          Issue Type: New Feature
>          Components: spark
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>         Attachments: HBASE-14150.1.patch, HBASE-14150.2.patch, 
> HBASE-14150.3.patch
>
>
> Add on to the work done in HBASE-13992 to add functionality to do a bulk load 
> from a given RDD.
> This will do the following:
> 1. figure out the number of regions and sort and partition the data correctly 
> to be written out to HFiles
> 2. Also unlike the MR bulkload I would like that the columns to be sorted in 
> the shuffle stage and not in the memory of the reducer.  This will allow this 
> design to support super wide records with out going out of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-14150) Add BulkLoad functionality to HBase-Spark Module

Reply via email to