[
https://issues.apache.org/jira/browse/SPARK-18544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reynold Xin updated SPARK-18544:
--------------------------------
Assignee: Eric Liang
> Append with df.saveAsTable writes data to wrong location
> --------------------------------------------------------
>
> Key: SPARK-18544
> URL: https://issues.apache.org/jira/browse/SPARK-18544
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Reporter: Eric Liang
> Assignee: Eric Liang
> Priority: Blocker
> Fix For: 2.1.0
>
>
> When using saveAsTable in append mode, data will be written to the wrong
> location for non-managed Datasource tables. The following example illustrates
> this.
> It seems somehow pass the wrong table path to InsertIntoHadoopFsRelation from
> DataFrameWriter. Also, we should probably remove the repair table call at the
> end of saveAsTable in DataFrameWriter. That shouldn't be needed in either the
> Hive or Datasource case.
> {code}
> scala> spark.sqlContext.range(100).selectExpr("id", "id as A", "id as
> B").write.partitionBy("A", "B").mode("overwrite").parquet("/tmp/test")
> scala> sql("create table test (id long, A int, B int) USING parquet OPTIONS
> (path '/tmp/test') PARTITIONED BY (A, B)")
> scala> sql("msck repair table test")
> scala> sql("select * from test where A = 1").count
> res6: Long = 1
> scala> spark.sqlContext.range(10).selectExpr("id", "id as A", "id as
> B").write.partitionBy("A", "B").mode("append").saveAsTable("test")
> scala> sql("select * from test where A = 1").count
> res8: Long = 1
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]