[
https://issues.apache.org/jira/browse/HUDI-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345904#comment-17345904
]
Ethan Guo commented on HUDI-1888:
---------------------------------
https://github.com/apache/hudi/pull/2957
> Fix NPE in `RowKeyGenertorHelper#getNestedFieldVal` when row writer is
> enabled
> -------------------------------------------------------------------------------
>
> Key: HUDI-1888
> URL: https://issues.apache.org/jira/browse/HUDI-1888
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Ethan Guo
> Assignee: Ethan Guo
> Priority: Major
>
> When row writer is enabled, NullPointerException is thrown when inserting
> records with partition path in a nested field.
>
> To reproduce:
> {code:java}
> df.write.format("hudi")
> .option(OPERATION_OPT_KEY, "bulk_insert")
> .option(PRECOMBINE_FIELD_OPT_KEY, "timestamp")
> .option(RECORDKEY_FIELD_OPT_KEY, "_row_key")
> .option(PARTITIONPATH_FIELD_OPT_KEY, "fare.currency")
> .option(HoodieWriteConfig.TABLE_NAME, "hoodie_test")
> .option("hoodie.metadata.enable", "true")
> .option("hoodie.datasource.write.row.writer.enable", "true")
> .option("hoodie.bulkinsert.shuffle.parallelism", "2")
> .mode(SaveMode.Overwrite)
> .save(basePath){code}
>
> Stacktrace:
> {code:java}
> Caused by: java.lang.NullPointerException
> at
> org.apache.hudi.keygen.RowKeyGeneratorHelper.lambda$getPartitionPathFromRow$1(RowKeyGeneratorHelper.java:117)
> at java.util.stream.IntPipeline$4$1.accept(IntPipeline.java:250)
> at
> java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
> at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at
> org.apache.hudi.keygen.RowKeyGeneratorHelper.getPartitionPathFromRow(RowKeyGeneratorHelper.java:124)
> at
> org.apache.hudi.keygen.SimpleKeyGenerator.getPartitionPath(SimpleKeyGenerator.java:72)
> at
> org.apache.spark.sql.UDFRegistration$$anonfun$259.apply(UDFRegistration.scala:759)
> ... 22 more
> {code}
>
> This happens when the value in the nested field of the partition path is
> null. The method above does not handle this properly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)