[
https://issues.apache.org/jira/browse/SPARK-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-9278:
--------------------------------
Comment: was deleted
(was: The result might be definitely different as I ran the codes below with
master branch of Spark, local environment without S3, Scala API and Mac OS.
Though, I will leave the comment about what I tested in case you might want to
test without the environments.
Here the codes I ran,
{code}
// Create data.
val alphabets = Seq("a", "e", "i", "o", "u")
val partA = (0 to 4).map(i => Seq(alphabets(i % 5), "a", i))
val partB = (5 to 9).map(i => Seq(alphabets(i % 5), "b", i))
val partC = (10 to 14).map(i => Seq(alphabets(i % 5), "c", i))
val data = partA ++ partB ++ partC
// Create RDD.
val rowsRDD = sc.parallelize(data.map(Row.fromSeq))
// Create Dataframe.
val schema = StructType(List(
StructField("k", StringType, true),
StructField("pk", StringType, true),
StructField("v", IntegerType, true))
)
val sdf = sqlContext.createDataFrame(rowsRDD, schema)
// Create a empty table.
sdf.filter("FALSE")
.write
.format("parquet")
.option("path", "foo")
.partitionBy("pk")
.saveAsTable("foo")
// Save a partitioned table.
sdf.filter("pk = 'a'")
.write
.partitionBy("pk")
.insertInto("foo")
// Select all.
val foo = sqlContext.table("foo")
foo.show()
{code}
And the result was correct as below.
{code}
+---+---+---+
| k| v| pk|
+---+---+---+
| a| 0| a|
| e| 1| a|
| i| 2| a|
| o| 3| a|
| u| 4| a|
+---+---+---+
{code})
> DataFrameWriter.insertInto inserts incorrect data
> -------------------------------------------------
>
> Key: SPARK-9278
> URL: https://issues.apache.org/jira/browse/SPARK-9278
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.4.0
> Environment: Linux, S3, Hive Metastore
> Reporter: Steve Lindemann
> Assignee: Cheng Lian
> Priority: Critical
>
> After creating a partitioned Hive table (stored as Parquet) via the
> DataFrameWriter.createTable command, subsequent attempts to insert additional
> data into new partitions of this table result in inserting incorrect data
> rows. Reordering the columns in the data to be written seems to avoid this
> issue.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]