[ 
https://issues.apache.org/jira/browse/SPARK-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066321#comment-14066321
 ] 

Ted Malaska commented on SPARK-2447:
------------------------------------

Code review on Thur July 17.

At least 14 action items before the next review
1. Convert var to val
2. Rename bulkGets to bulkGet and repeat for others
3. rename this the private map method to mapPartition
4. add commits for every method
5. Indenting isn't correct for all lines
6. Close all hTable (I forgot one)
7. Unit tests for everything
8. Change the sending of Configuration to be broadcast so reduce IO to the 
workers and reduce the start up time
9. Store HConnection in a static place so that all partitions on a worker does 
have to create a HConnection
10. Map of Connections (we need to support being about to connect to more then 
one connection)
11. BulkGet needs to comment about red in and out
12. SparkContext should be given to the HBaseContext constructer 
13. remove default constructor
14. Use Seralizable writable in Spark (HadoopRDD as an Example)

> Add common solution for sending upsert actions to HBase (put, deletes, and 
> increment)
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-2447
>                 URL: https://issues.apache.org/jira/browse/SPARK-2447
>             Project: Spark
>          Issue Type: New Feature
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>
> Going to review the design with Tdas today.  
> But first thoughts is to have an extension of VoidFunction that handles the 
> connection to HBase and allows for options such as turning auto flush off for 
> higher through put.
> Need to answer the following questions first.
> - Can it be written in Java or should it be written in Scala?
> - What is the best way to add the HBase dependency? (will review how Flume 
> does this as the first option)
> - What is the best way to do testing? (will review how Flume does this as the 
> first option)
> - How to support python? (python may be a different Jira it is unknown at 
> this time)
> Goals:
> - Simple to use
> - Stable
> - Supports high load
> - Documented (May be in a separate Jira need to ask Tdas)
> - Supports Java, Scala, and hopefully Python
> - Supports Streaming and normal Spark



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to