[jira] [Updated] (SPARK-18544) Append with df.saveAsTable writes data to wrong location

Reynold Xin (JIRA) Mon, 28 Nov 2016 21:59:21 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-18544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Reynold Xin updated SPARK-18544:
--------------------------------
    Assignee: Eric Liang

> Append with df.saveAsTable writes data to wrong location
> --------------------------------------------------------
>
>                 Key: SPARK-18544
>                 URL: https://issues.apache.org/jira/browse/SPARK-18544
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Eric Liang
>            Assignee: Eric Liang
>            Priority: Blocker
>             Fix For: 2.1.0
>
>
> When using saveAsTable in append mode, data will be written to the wrong 
> location for non-managed Datasource tables. The following example illustrates 
> this.
> It seems somehow pass the wrong table path to InsertIntoHadoopFsRelation from 
> DataFrameWriter. Also, we should probably remove the repair table call at the 
> end of saveAsTable in DataFrameWriter. That shouldn't be needed in either the 
> Hive or Datasource case.
> {code}
> scala> spark.sqlContext.range(100).selectExpr("id", "id as A", "id as 
> B").write.partitionBy("A", "B").mode("overwrite").parquet("/tmp/test")
> scala> sql("create table test (id long, A int, B int) USING parquet OPTIONS 
> (path '/tmp/test') PARTITIONED BY (A, B)")
> scala> sql("msck repair table test")
> scala> sql("select * from test where A = 1").count
> res6: Long = 1
> scala> spark.sqlContext.range(10).selectExpr("id", "id as A", "id as 
> B").write.partitionBy("A", "B").mode("append").saveAsTable("test")
> scala> sql("select * from test where A = 1").count
> res8: Long = 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-18544) Append with df.saveAsTable writes data to wrong location

Reply via email to