[GitHub] [hudi] leesf commented on a change in pull request #1720: [HUDI-1003] Handle partitions correctly for syncing hudi non-parititioned table to hive

GitBox Fri, 12 Jun 2020 19:35:32 -0700


leesf commented on a change in pull request #1720:
URL: https://github.com/apache/hudi/pull/1720#discussion_r439701681




##########
File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##########
@@ -247,7 +247,13 @@ private[hudi] object HoodieSparkSqlWriter {
     hiveSyncConfig.hivePass = parameters(HIVE_PASS_OPT_KEY)
     hiveSyncConfig.jdbcUrl = parameters(HIVE_URL_OPT_KEY)
     hiveSyncConfig.partitionFields =
-      
ListBuffer(parameters(HIVE_PARTITION_FIELDS_OPT_KEY).split(",").map(_.trim).filter(!_.isEmpty).toList:
 _*)
+      // Set partitionFields to empty, when the NonPartitionedExtractor is used
+      if 
(classOf[NonPartitionedExtractor].getName.equals(parameters(HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY)))
 {
+        log.warn(s"Parameter '$HIVE_PARTITION_FIELDS_OPT_KEY' is ignored, 
since the NonPartitionedExtractor is used")
+        Array.empty[String].toList
+      } else {
+        
ListBuffer(parameters(HIVE_PARTITION_FIELDS_OPT_KEY).split(",").map(_.trim).filter(!_.isEmpty).toList:
 _*)
+      }

Review comment:
       I think we would move the logic to hudi-hive module, using 
sparkdatasource writing data to hudi and sync to hive is one way, also, users 
may also use api(HiveSyncTool) to sync to hive, we should handle this case as 
well. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] leesf commented on a change in pull request #1720: [HUDI-1003] Handle partitions correctly for syncing hudi non-parititioned table to hive

Reply via email to