Chaozhong Yang created SPARK-21687:
--------------------------------------
Summary: Spark SQL should set createTime for Hive partition
Key: SPARK-21687
URL: https://issues.apache.org/jira/browse/SPARK-21687
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.2.0, 2.1.0
Reporter: Chaozhong Yang
Fix For: 2.3.0
In Spark SQL, we often use `insert overwite table t partition(p=xx)` to create
partition for partitioned table. `createTime` is an important information to
manage data lifecycle, e.g TTL.
However, we found that Spark SQL doesn't call setCreateTime in
`HiveClientImpl#toHivePartition` as follows:
{code:scala}
def toHivePartition(
p: CatalogTablePartition,
ht: HiveTable): HivePartition = {
val tpart = new org.apache.hadoop.hive.metastore.api.Partition
val partValues = ht.getPartCols.asScala.map { hc =>
p.spec.get(hc.getName).getOrElse {
throw new IllegalArgumentException(
s"Partition spec is missing a value for column '${hc.getName}':
${p.spec}")
}
}
val storageDesc = new StorageDescriptor
val serdeInfo = new SerDeInfo
p.storage.locationUri.map(CatalogUtils.URIToString(_)).foreach(storageDesc.setLocation)
p.storage.inputFormat.foreach(storageDesc.setInputFormat)
p.storage.outputFormat.foreach(storageDesc.setOutputFormat)
p.storage.serde.foreach(serdeInfo.setSerializationLib)
serdeInfo.setParameters(p.storage.properties.asJava)
storageDesc.setSerdeInfo(serdeInfo)
tpart.setDbName(ht.getDbName)
tpart.setTableName(ht.getTableName)
tpart.setValues(partValues.asJava)
tpart.setSd(storageDesc)
new HivePartition(ht, tpart)
}
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]