Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22707#discussion_r240002357 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -227,18 +227,22 @@ case class InsertIntoHiveTable( // Newer Hive largely improves insert overwrite performance. As Spark uses older Hive // version and we may not want to catch up new Hive version every time. We delete the // Hive partition first and then load data file into the Hive partition. - if (oldPart.nonEmpty && overwrite) { - oldPart.get.storage.locationUri.foreach { uri => - val partitionPath = new Path(uri) - val fs = partitionPath.getFileSystem(hadoopConf) - if (fs.exists(partitionPath)) { - if (!fs.delete(partitionPath, true)) { - throw new RuntimeException( - "Cannot remove partition directory '" + partitionPath.toString) - } - // Don't let Hive do overwrite operation since it is slower. - doHiveOverwrite = false + if (overwrite) { + val oldPartitionPath = oldPart.flatMap(_.storage.locationUri.map(new Path(_))) + .getOrElse { + ExternalCatalogUtils.generatePartitionPath( + partitionSpec, + partitionColumnNames, + new Path(table.location)) --- End diff -- We still need to consider the old path, `oldPart`? Can't we write this?; ``` val oldPartitionPath = ExternalCatalogUtils.generatePartitionPath( partitionSpec, partitionColumnNames, new Path(table.location)) ``` Also, can you write a comment about how to solve this issue here and in the pr description?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org