Chaozhong Yang created SPARK-21687: -------------------------------------- Summary: Spark SQL should set createTime for Hive partition Key: SPARK-21687 URL: https://issues.apache.org/jira/browse/SPARK-21687 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0, 2.1.0 Reporter: Chaozhong Yang Fix For: 2.3.0
In Spark SQL, we often use `insert overwite table t partition(p=xx)` to create partition for partitioned table. `createTime` is an important information to manage data lifecycle, e.g TTL. However, we found that Spark SQL doesn't call setCreateTime in `HiveClientImpl#toHivePartition` as follows: {code:scala} def toHivePartition( p: CatalogTablePartition, ht: HiveTable): HivePartition = { val tpart = new org.apache.hadoop.hive.metastore.api.Partition val partValues = ht.getPartCols.asScala.map { hc => p.spec.get(hc.getName).getOrElse { throw new IllegalArgumentException( s"Partition spec is missing a value for column '${hc.getName}': ${p.spec}") } } val storageDesc = new StorageDescriptor val serdeInfo = new SerDeInfo p.storage.locationUri.map(CatalogUtils.URIToString(_)).foreach(storageDesc.setLocation) p.storage.inputFormat.foreach(storageDesc.setInputFormat) p.storage.outputFormat.foreach(storageDesc.setOutputFormat) p.storage.serde.foreach(serdeInfo.setSerializationLib) serdeInfo.setParameters(p.storage.properties.asJava) storageDesc.setSerdeInfo(serdeInfo) tpart.setDbName(ht.getDbName) tpart.setTableName(ht.getTableName) tpart.setValues(partValues.asJava) tpart.setSd(storageDesc) new HivePartition(ht, tpart) } {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org