[GitHub] [hudi] nsivabalan commented on a diff in pull request #5664: [HUDI-4140] Fixing hive style partitioning and default partition with bulk insert row writer with SimpleKeyGen and virtual keys

GitBox Mon, 06 Jun 2022 10:18:41 -0700


nsivabalan commented on code in PR #5664:
URL: https://github.com/apache/hudi/pull/5664#discussion_r889197028



##########
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/BulkInsertDataInternalWriterHelper.java:
##########
@@ -87,6 +89,9 @@ public BulkInsertDataInternalWriterHelper(HoodieTable 
hoodieTable, HoodieWriteCo
     this.populateMetaFields = populateMetaFields;
     this.arePartitionRecordsSorted = arePartitionRecordsSorted;
     this.fileIdPrefix = UUID.randomUUID().toString();
+    this.isHiveStylePartitioning = 
writeConfig.getProps().containsKey(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING().key())
+        ? Boolean.parseBoolean((String) 
writeConfig.getProps().get(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING().key()))
+        : 
Boolean.parseBoolean(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING().defaultValue());

Review Comment:
   Initially I did not add since DataSourceOptions wasn't reachable from 
WriteConfig. but found out that  I can use KeyGeneratorOptions. will fix it



##########
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/BulkInsertDataInternalWriterHelper.java:
##########
@@ -128,7 +133,11 @@ public void write(InternalRow record) throws IOException {
         if (!keyGeneratorOpt.isPresent()) { // NoPartitionerKeyGen
           partitionPath = "";
         } else if (simpleKeyGen) { // SimpleKeyGen
-          partitionPath = (record.get(simplePartitionFieldIndex, 
simplePartitionFieldDataType)).toString();
+          Object parititionPathValue = record.get(simplePartitionFieldIndex, 
simplePartitionFieldDataType);
+          partitionPath = parititionPathValue != null ? 
parititionPathValue.toString() : 
PartitionPathEncodeUtils.DEFAULT_PARTITION_PATH;
+          if (isHiveStylePartitioning) {
+            partitionPath = 
(keyGeneratorOpt.get()).getPartitionPathFields().get(0) + "=" + partitionPath;

Review Comment:
   I have filed a follow up jira around this 
https://issues.apache.org/jira/browse/HUDI-4199. there could be more gaps in 
here. for now, this patch focuses on hive style partitioning. 



##########
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/BulkInsertDataInternalWriterHelper.java:
##########
@@ -128,7 +133,11 @@ public void write(InternalRow record) throws IOException {
         if (!keyGeneratorOpt.isPresent()) { // NoPartitionerKeyGen
           partitionPath = "";
         } else if (simpleKeyGen) { // SimpleKeyGen
-          partitionPath = (record.get(simplePartitionFieldIndex, 
simplePartitionFieldDataType)).toString();
+          Object parititionPathValue = record.get(simplePartitionFieldIndex, 
simplePartitionFieldDataType);
+          partitionPath = parititionPathValue != null ? 
parititionPathValue.toString() : 
PartitionPathEncodeUtils.DEFAULT_PARTITION_PATH;
+          if (isHiveStylePartitioning) {
+            partitionPath = 
(keyGeneratorOpt.get()).getPartitionPathFields().get(0) + "=" + partitionPath;

Review Comment:
   this block is applicable only for simple key generator. for other key 
generators, L143 is used. so we should be good. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nsivabalan commented on a diff in pull request #5664: [HUDI-4140] Fixing hive style partitioning and default partition with bulk insert row writer with SimpleKeyGen and virtual keys

Reply via email to