[
https://issues.apache.org/jira/browse/SPARK-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267576#comment-15267576
]
Xin Wu commented on SPARK-14927:
--------------------------------
right now, when a datasource table is created with partition, it is not a hive
compatiable table.
So maybe need to create the table like {code}create table tmp.tmp1 (val string)
partitioned by (year int) stored as parquet location '....' {code}
Then insert into the table with a temp table that is derived from the
dataframe. Something I tried below
{code}
scala> df.show
+----+---+
|year|val|
+----+---+
|2012| a|
|2013| b|
|2014| c|
+----+---+
scala> val df1 = spark.sql("select * from t000 where year = 2012")
df1: org.apache.spark.sql.DataFrame = [year: int, val: string]
scala> df1.registerTempTable("df1")
scala> spark.sql("insert into tmp.ptest3 partition(year=2012) select * from
df1")
scala> val df2 = spark.sql("select * from t000 where year = 2013")
df2: org.apache.spark.sql.DataFrame = [year: int, val: string]
scala> df2.registerTempTable("df2")
scala> spark.sql("insert into tmp.ptest3 partition(year=2013) select val from
df2")
16/05/02 14:47:34 WARN log: Updating partition stats fast for: ptest3
16/05/02 14:47:34 WARN log: Updated size to 327
res54: org.apache.spark.sql.DataFrame = []
scala> spark.sql("show partitions tmp.ptest3").show
+---------+
| result|
+---------+
|year=2012|
|year=2013|
+---------+
{code}
This is a bit hacky though. hope someone has a better solution for your
problem. And this is on spark 2.0. Try if 1.6 can take this.
> DataFrame. saveAsTable creates RDD partitions but not Hive partitions
> ---------------------------------------------------------------------
>
> Key: SPARK-14927
> URL: https://issues.apache.org/jira/browse/SPARK-14927
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.5.2, 1.6.1
> Environment: Mac OS X 10.11.4 local
> Reporter: Sasha Ovsankin
>
> This is a followup to
> http://stackoverflow.com/questions/31341498/save-spark-dataframe-as-dynamic-partitioned-table-in-hive
> . I tried to use suggestions in the answers but couldn't make it to work in
> Spark 1.6.1
> I am trying to create partitions programmatically from `DataFrame. Here is
> the relevant code (adapted from a Spark test):
> hc.setConf("hive.metastore.warehouse.dir", "tmp/tests")
> // hc.setConf("hive.exec.dynamic.partition", "true")
> // hc.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
> hc.sql("create database if not exists tmp")
> hc.sql("drop table if exists tmp.partitiontest1")
> Seq(2012 -> "a").toDF("year", "val")
> .write
> .partitionBy("year")
> .mode(SaveMode.Append)
> .saveAsTable("tmp.partitiontest1")
> hc.sql("show partitions tmp.partitiontest1").show
> Full file is here:
> https://gist.github.com/SashaOv/7c65f03a51c7e8f9c9e018cd42aa4c4a
> I get the error that the table is not partitioned:
> ======================
> HIVE FAILURE OUTPUT
> ======================
> SET hive.support.sql11.reserved.keywords=false
> SET hive.metastore.warehouse.dir=tmp/tests
> OK
> OK
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask. Table tmp.partitiontest1 is not a
> partitioned table
> ======================
> It looks like the root cause is that
> `org.apache.spark.sql.hive.HiveMetastoreCatalog.newSparkSQLSpecificMetastoreTable`
> always creates table with empty partitions.
> Any help to move this forward is appreciated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]