[jira] [Commented] (KUDU-2235) Spark SQL insert command is actually an upsert

Dan Burkert (JIRA) Wed, 06 Dec 2017 10:45:31 -0800

    [ 
https://issues.apache.org/jira/browse/KUDU-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280648#comment-16280648
 ]


Dan Burkert commented on KUDU-2235:
-----------------------------------

I don't think so.  The only Spark documentation I'm aware of is in 
http://kudu.apache.org/docs/developing.html#_kudu_integration_with_spark, and 
that doesn't cover it.  We should leave this JIRA open to track adding docs for 
the feature.

> Spark SQL insert command is actually an upsert
> ----------------------------------------------
>
>                 Key: KUDU-2235
>                 URL: https://issues.apache.org/jira/browse/KUDU-2235
>             Project: Kudu
>          Issue Type: Bug
>          Components: spark
>    Affects Versions: 1.5.0
>         Environment: CDH 5.13
>            Reporter: Diana Carroll
>
> The Spark SQL 'INSERT' command is actually doing an upsert when used on a 
> Kudu table.  
> Example:
> 1) Create a table in Impala like this:
> {code}
> create table test13 (k1 string, c2 string, c3 string, primary key(k1))
> partition by hash partitions 2 stored as kudu
> {code}
> 2) Try an Impala INSERT to demonstrate correct insert behavior
> {code}> insert into test13 values ('x','x','x'),('y','y','y');
> Modified 2 row(s), 0 row error(s) in 3.70s
> > select * from test13;
> +----+----+-------------+
> | k1 | c2 | c3          |
> +----+----+-------------+
> | x  | x  | x           |
> | y  | y  | y           |
> +----+----+-------------+
> > insert into test13 values ('x','x','test insert'),('z','z','z');
> WARNINGS: Key already present in Kudu table 'impala::default.test13'.
> Modified 1 row(s), 1 row error(s) in 0.11s
> > select * from test13;
> +----+----+-------------+
> | k1 | c2 | c3          |
> +----+----+-------------+
> | x  | x  | x           |
> | y  | y  | y           |
> | z  | z  | z           |
> +----+----+-------------+
> {code}
> 3) Try the same sequence of operations in Spark (Scala)
> {code}
> scala> val test13 = 
> spark.read.format("org.apache.kudu.spark.kudu").option("kudu.master",kuduMaster).option("kudu.table","impala::default.test13").load
> scala> test13.createTempView("test13")
> scala> spark.sql("insert into test13 values ('a','a','a'),('c','c','c')")
> scala> test13.show
> +---+---+---+
> | k1| c2| c3|
> +---+---+---+
> |  a|  a|  a|
> |  c|  c|  c|
> +---+---+---+
> scala> spark.sql("insert into test13 values ('a','a','test 
> update'),('d','d','d')")
> scala> test13.show
> +---+---+-----------+
> | k1| c2|         c3|
> +---+---+-----------+
> |  a|  a|test update|
> |  c|  c|          c|
> |  d|  d|          d|
> +---+---+-----------+
> {code}
> note that in Spark, but not in Impala, the row matching the existing key was 
> changed (updated), and the row with the new key was added.  Neither should 
> happen with an insert.
> 'Upsert' isn't actually a valid command.
> This important difference between Impala SQL and Spark SQL with respect to 
> Kudu is confusing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KUDU-2235) Spark SQL insert command is actually an upsert

Reply via email to