[jira] [Commented] (KUDU-1214) Add Integration points for Spark, Spark Streaming, and Spark SQL

Todd Lipcon (JIRA) Mon, 07 Mar 2016 08:53:48 -0800

    [ 
https://issues.apache.org/jira/browse/KUDU-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183252#comment-15183252
 ]


Todd Lipcon commented on KUDU-1214:
-----------------------------------

Let's separate your six items into separate specific things. One of the reasons 
we cut down the API in the original patch was that it's much more manageable to 
review and integrate this stuff in stages. We've also typically been hesitant 
to add a lot of public APIs to Kudu itself -- it's like letting toothpaste out 
of the tube. Once a public API is out, you can't take it back. So, my 
preference is to err on the side of "don't need it" in cases where the user can 
easily wrap our existing APIs to provide what they needed.

My recollection of several of the APIs mentioned above is that they were simply 
wrappers around the normal RDD foreach/map/etc which automatically created a 
KuduClient object. Imran from the Spark team ("Anonymous Coward" on 
http://gerrit.cloudera.org:8080/#/c/1788/) also wasn't quite sure if these APIs 
were the best way to express the functionality. Could you explain in a little 
more detail what functionality is missing from the normal "rdd.map" API exposed 
by Spark? Why not just create a Kudu client within your map function? Maybe you 
can provide a 'before/after' example to motivate how the API makes the user's 
life easier?

> Add Integration points for Spark, Spark Streaming, and Spark SQL
> ----------------------------------------------------------------
>
>                 Key: KUDU-1214
>                 URL: https://issues.apache.org/jira/browse/KUDU-1214
>             Project: Kudu
>          Issue Type: New Feature
>          Components: integration
>            Reporter: Ted Malaska
>         Attachments: KUDU-1214.1.patch
>
>
> This Jira will be broken up into four main jira:
> 1. Add Support for Spark RDD map and foreach integration with Kudu
> 2. Add Support for Spark DStream map and foreach integration with Kudu
> 3. Add Support for Spark SQL defaultSource and push down predicates
> 4. Add documentation for all Spark Integrations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KUDU-1214) Add Integration points for Spark, Spark Streaming, and Spark SQL

Reply via email to