[ 
https://issues.apache.org/jira/browse/HUDI-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345904#comment-17345904
 ] 

Ethan Guo commented on HUDI-1888:
---------------------------------

https://github.com/apache/hudi/pull/2957

> Fix NPE in `RowKeyGenertorHelper#getNestedFieldVal` when row writer is 
> enabled 
> -------------------------------------------------------------------------------
>
>                 Key: HUDI-1888
>                 URL: https://issues.apache.org/jira/browse/HUDI-1888
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Ethan Guo
>            Assignee: Ethan Guo
>            Priority: Major
>
> When row writer is enabled, NullPointerException is thrown when inserting 
> records with partition path in a nested field.
>  
> To reproduce:
> {code:java}
> df.write.format("hudi")
>   .option(OPERATION_OPT_KEY, "bulk_insert")
>   .option(PRECOMBINE_FIELD_OPT_KEY, "timestamp")
>   .option(RECORDKEY_FIELD_OPT_KEY, "_row_key")
>   .option(PARTITIONPATH_FIELD_OPT_KEY, "fare.currency")
>   .option(HoodieWriteConfig.TABLE_NAME, "hoodie_test")
>   .option("hoodie.metadata.enable", "true")
>   .option("hoodie.datasource.write.row.writer.enable", "true")
>   .option("hoodie.bulkinsert.shuffle.parallelism", "2")
>   .mode(SaveMode.Overwrite)
>   .save(basePath){code}
>  
> Stacktrace:
> {code:java}
> Caused by: java.lang.NullPointerException
>       at 
> org.apache.hudi.keygen.RowKeyGeneratorHelper.lambda$getPartitionPathFromRow$1(RowKeyGeneratorHelper.java:117)
>       at java.util.stream.IntPipeline$4$1.accept(IntPipeline.java:250)
>       at 
> java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
>       at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)
>       at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>       at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>       at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>       at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>       at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>       at 
> org.apache.hudi.keygen.RowKeyGeneratorHelper.getPartitionPathFromRow(RowKeyGeneratorHelper.java:124)
>       at 
> org.apache.hudi.keygen.SimpleKeyGenerator.getPartitionPath(SimpleKeyGenerator.java:72)
>       at 
> org.apache.spark.sql.UDFRegistration$$anonfun$259.apply(UDFRegistration.scala:759)
>       ... 22 more
> {code}
>  
>  This happens when the value in the nested field of the partition path is 
> null.  The method above does not handle this properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to