[ 
https://issues.apache.org/jira/browse/SPARK-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068549#comment-14068549
 ] 

Ted Malaska commented on SPARK-2447:
------------------------------------

Over the weekend I got the following done:
1. Converted most vars to vals
2. Rename bulkGets to bulkGet and repeat for others
3. rename this the private map method to mapPartition
5. Indenting isn't correct for all lines
6. Close all hTable (I forgot one)
8. Change the sending of Configuration to be broadcast so reduce IO to the 
workers and reduce the start up time
9. Store HConnection in a static place so that all partitions on a worker does 
have to create a HConnection
10. Map of Connections (we need to support being about to connect to more then 
one connection)
11. BulkGet needs to comment about red in and out
12. SparkContext should be given to the HBaseContext constructer 
13. remove default constructor
14. Use Seralizable writable in Spark (HadoopRDD as an Example)

Extra:
1. Finished first cut of design doc 
https://github.com/tmalaska/SparkOnHBase/blob/master/SparkOnHBase.Design.Doc.docx
2. Built support for spark streaming
3. Built put spark streaming example

> Add common solution for sending upsert actions to HBase (put, deletes, and 
> increment)
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-2447
>                 URL: https://issues.apache.org/jira/browse/SPARK-2447
>             Project: Spark
>          Issue Type: New Feature
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>
> Going to review the design with Tdas today.  
> But first thoughts is to have an extension of VoidFunction that handles the 
> connection to HBase and allows for options such as turning auto flush off for 
> higher through put.
> Need to answer the following questions first.
> - Can it be written in Java or should it be written in Scala?
> - What is the best way to add the HBase dependency? (will review how Flume 
> does this as the first option)
> - What is the best way to do testing? (will review how Flume does this as the 
> first option)
> - How to support python? (python may be a different Jira it is unknown at 
> this time)
> Goals:
> - Simple to use
> - Stable
> - Supports high load
> - Documented (May be in a separate Jira need to ask Tdas)
> - Supports Java, Scala, and hopefully Python
> - Supports Streaming and normal Spark



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to