[GitHub] spark pull request #22707: [SPARK-25717][SQL] Insert overwrite a recreated e...

maropu Sat, 08 Dec 2018 03:43:12 -0800

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22707#discussion_r240002357
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
    @@ -227,18 +227,22 @@ case class InsertIntoHiveTable(
               // Newer Hive largely improves insert overwrite performance. As 
Spark uses older Hive
               // version and we may not want to catch up new Hive version 
every time. We delete the
               // Hive partition first and then load data file into the Hive 
partition.
    -          if (oldPart.nonEmpty && overwrite) {
    -            oldPart.get.storage.locationUri.foreach { uri =>
    -              val partitionPath = new Path(uri)
    -              val fs = partitionPath.getFileSystem(hadoopConf)
    -              if (fs.exists(partitionPath)) {
    -                if (!fs.delete(partitionPath, true)) {
    -                  throw new RuntimeException(
    -                    "Cannot remove partition directory '" + 
partitionPath.toString)
    -                }
    -                // Don't let Hive do overwrite operation since it is 
slower.
    -                doHiveOverwrite = false
    +          if (overwrite) {
    +            val oldPartitionPath = 
oldPart.flatMap(_.storage.locationUri.map(new Path(_)))
    +              .getOrElse {
    +                ExternalCatalogUtils.generatePartitionPath(
    +                  partitionSpec,
    +                  partitionColumnNames,
    +                  new Path(table.location))
    --- End diff --
    
    We still need to consider the old path, `oldPart`? Can't we write this?;
    ```
                val oldPartitionPath = 
ExternalCatalogUtils.generatePartitionPath(
                  partitionSpec,
                  partitionColumnNames,
                  new Path(table.location))
    ```
    Also, can you write a comment about how to solve this issue here and in the 
pr description?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22707: [SPARK-25717][SQL] Insert overwrite a recreated e...

Reply via email to