[jira] [Commented] (KUDU-1214) Add Integration points for Spark, Spark Streaming, and Spark SQL

Ted Malaska (JIRA) Mon, 07 Mar 2016 13:08:55 -0800

    [ 
https://issues.apache.org/jira/browse/KUDU-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183711#comment-15183711
 ]


Ted Malaska commented on KUDU-1214:
-----------------------------------

Hence the need for a tested common implementation.

In my patch.  I send only the master URI to the executors and each executor 
will make a kudu client and hold on to it in a static location so it can be 
used by all the tasks running on that executor and live past one spark 
streaming iteration.

So with that we have a need for number 1 and I believe we have a need for 
number 2.  The design I'm planning on building has been in HBase for some time 
and has been used in production.  But I'm totally open for review if there is a 
better way.





> Add Integration points for Spark, Spark Streaming, and Spark SQL
> ----------------------------------------------------------------
>
>                 Key: KUDU-1214
>                 URL: https://issues.apache.org/jira/browse/KUDU-1214
>             Project: Kudu
>          Issue Type: New Feature
>          Components: integration
>            Reporter: Ted Malaska
>         Attachments: KUDU-1214.1.patch
>
>
> This Jira will be broken up into four main jira:
> 1. Add Support for Spark RDD map and foreach integration with Kudu
> 2. Add Support for Spark DStream map and foreach integration with Kudu
> 3. Add Support for Spark SQL defaultSource and push down predicates
> 4. Add documentation for all Spark Integrations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KUDU-1214) Add Integration points for Spark, Spark Streaming, and Spark SQL

Reply via email to