Aman Raj created SPARK-44058:
--------------------------------
Summary: Remove deprecated API usage in HiveShim.scala
Key: SPARK-44058
URL: https://issues.apache.org/jira/browse/SPARK-44058
Project: Spark
Issue Type: Bug
Components: Spark Submit
Affects Versions: 3.4.0
Reporter: Aman Raj
Spark's HiveShim.scala calls this particular method in Hive :
createPartitionMethod.invoke(
hive,
table,
spec,
location,
params, // partParams
null, // inputFormat
null, // outputFormat
-1: JInteger, // numBuckets
null, // cols
null, // serializationLib
null, // serdeParams
null, // bucketCols
null) // sortCols
}
We do not have any such implementation of createPartition in Hive. We only have
this definition :
public Partition createPartition(Table tbl, Map<String, String> partSpec)
throws HiveException {
try {
org.apache.hadoop.hive.metastore.api.Partition part =
Partition.createMetaPartitionObject(tbl, partSpec, null);
AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf,
tbl);
part.setWriteId(tableSnapshot != null ? tableSnapshot.getWriteId() : 0);
return new Partition(tbl, getMSC().add_partition(part));
} catch (Exception e) {
LOG.error(StringUtils.stringifyException(e));
throw new HiveException(e);
}
}
The issue is that this 12 parameter implementation of createPartition method
was added in Hive-0.12 and then was removed in Hive-0.13. When Hive 0.12 was
used in Spark, [SPARK-15334] commit in Spark added this 12 parameters
implementation. But after Hive migrated to newer APIs somehow this was not
changed in Spark OSS and it looks to us like a Bug from the Spark end.
We need to migrate to the newest implementation of Hive createPartition method
otherwise this flow can break
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]