[
https://issues.apache.org/jira/browse/HBASE-14150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14694013#comment-14694013
]
Hudson commented on HBASE-14150:
--------------------------------
FAILURE: Integrated in HBase-TRUNK #6718 (See
[https://builds.apache.org/job/HBase-TRUNK/6718/])
HBASE-14150 Add BulkLoad functionality to HBase-Spark Module (Ted Malaska)
(tedyu: rev 72a48a1333f6c01c46cd244439198ccce3f941ac)
* hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala
*
hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/FamilyHFileWriteOptions.scala
*
hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/KeyFamilyQualifier.scala
*
hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseRDDFunctions.scala
*
hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/BulkLoadPartitioner.scala
* hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/BulkLoadSuite.scala
> Add BulkLoad functionality to HBase-Spark Module
> ------------------------------------------------
>
> Key: HBASE-14150
> URL: https://issues.apache.org/jira/browse/HBASE-14150
> Project: HBase
> Issue Type: New Feature
> Components: spark
> Reporter: Ted Malaska
> Assignee: Ted Malaska
> Fix For: 2.0.0
>
> Attachments: HBASE-14150.1.patch, HBASE-14150.2.patch,
> HBASE-14150.3.patch, HBASE-14150.4.patch, HBASE-14150.5.patch
>
>
> Add on to the work done in HBASE-13992 to add functionality to do a bulk load
> from a given RDD.
> This will do the following:
> 1. figure out the number of regions and sort and partition the data correctly
> to be written out to HFiles
> 2. Also unlike the MR bulkload I would like that the columns to be sorted in
> the shuffle stage and not in the memory of the reducer. This will allow this
> design to support super wide records with out going out of memory.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)