Burak Yavuz created SPARK-31178: ----------------------------------- Summary: sql("INSERT INTO v2DataSource ...").collect() double inserts Key: SPARK-31178 URL: https://issues.apache.org/jira/browse/SPARK-31178 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Burak Yavuz
The following unit test fails in DataSourceV2SQLSuite: {code:java} test("do not double insert on INSERT INTO collect()") { import testImplicits._ val t1 = s"${catalogAndNamespace}tbl" sql(s"CREATE TABLE $t1 (id bigint, data string) USING $v2Format") val tmpView = "test_data" val df = Seq((1L, "a"), (2L, "b"), (3L, "c")).toDF("id", "data") df.createOrReplaceTempView(tmpView) sql(s"INSERT INTO TABLE $t1 SELECT * FROM $tmpView").collect() verifyTable(t1, df) } {code} The INSERT INTO is double inserting when ".collect()" is called. I think this is because the V2 SparkPlans are not commands, and doExecute on a Spark plan can be called multiple times causing data to be inserted multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org