Re: Intro & kudu-spark support for persisting DataFrames

Dan Burkert Thu, 05 May 2016 18:18:06 -0700

Hey Andy,

Thanks for the patch!  I left you some specific feedback in on the gerrit
review, but I want to discuss the high level approach a bit.  I think the
patch as it's written now is going to have limited use, because it doesn't
allow for specifying primary keys or partitioning, which are critical for
correctness and performance. In the long run we will definitely want to be
able to create tables through Spark SQL, but perhaps we should start of
with just inserting/updating rows in existing tables.  It would be
interesting to see how other databases solved this problem, since I'm sure
we're not the only ones with configuration options on table create.  The
relational databases in particular must have PK options.


- Dan

On Thu, May 5, 2016 at 5:51 PM, Andy Grove <[email protected]> wrote:

> Hi,
>
> I'm working with some colleagues at AgilData on Spark/Kudu integration and
> we expect to be able to contribute a number of features to the code base.
>
> To kick things off, here is a gerrit for discussion that adds support for
> persisting a DataFrame to a Kudu table. It would be great to hear feedback
> and feature requests for this capability.
>
> http://gerrit.cloudera.org:8080/#/c/2969/
>
> Thanks,
>
> Andy.
>

Re: Intro & kudu-spark support for persisting DataFrames

Reply via email to