[
https://issues.apache.org/jira/browse/SPARK-37913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alana Young updated SPARK-37913:
--------------------------------
Description:
I am trying to create and persist a ML pipeline model using a custom Spark
transformer that I created based on the [Unary Transformer
example|https://github.com/apache/spark/blob/v3.1.2/examples/src/main/scala/org/apache/spark/examples/ml/UnaryTransformerExample.scala]
provided by Spark. I am able to save and load the transformer. When I include
the custom transformer as a stage in a pipeline model, I can save the model,
but am unable to load it. Here is the stack trace of the exception:
{code:java}
01-14-2022 03:49:52 PM ERROR Instrumentation: java.lang.NullPointerException at
java.base/java.lang.reflect.Method.invoke(Method.java:559) at
org.apache.spark.ml.util.DefaultParamsReader$.loadParamsInstanceReader(ReadWrite.scala:631)
at
org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$4(Pipeline.scala:276)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at
scala.collection.TraversableLike.map(TraversableLike.scala:238) at
scala.collection.TraversableLike.map$(TraversableLike.scala:231) at
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198) at
org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$3(Pipeline.scala:274)
at
org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
at scala.util.Try$.apply(Try.scala:213) at
org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:268) at
org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$7(Pipeline.scala:356)
at org.apache.spark.ml.MLEvents.withLoadInstanceEvent(events.scala:160) at
org.apache.spark.ml.MLEvents.withLoadInstanceEvent$(events.scala:155) at
org.apache.spark.ml.util.Instrumentation.withLoadInstanceEvent(Instrumentation.scala:42)
at
org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$6(Pipeline.scala:355)
at
org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
at scala.util.Try$.apply(Try.scala:213) at
org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
at
org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:355)
at
org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:349)
at org.apache.spark.ml.util.MLReadable.load(ReadWrite.scala:355) at
org.apache.spark.ml.util.MLReadable.load$(ReadWrite.scala:355) at
org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:337) at
com.dtech.scala.pipeline.PipelineProcess.process(PipelineProcess.scala:122) at
com.dtech.scala.pipeline.PipelineProcess$.main(PipelineProcess.scala:448) at
com.dtech.scala.pipeline.PipelineProcess.main(PipelineProcess.scala) at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566) at
org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65) at
org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala){code}
*Source Code*
[Unary
Transformer|https://gist.github.com/ally1221/ff10ec50f7ef98fb6cd365172195bfd5]
[Persist Unary Transformer & Pipeline
Model|https://gist.github.com/ally1221/42473cdc818a8cf795ac78d65d48ee14]
was:
I am trying to create and persist a ML pipeline model using a custom Spark
transformer that I created based on the [Unary Transformer
example|https://github.com/apache/spark/blob/v3.1.2/examples/src/main/scala/org/apache/spark/examples/ml/UnaryTransformerExample.scala]
provided by Spark. I am able to save and load the transformer. When I include
the custom transformer as a stage in a pipeline model, I can save the model,
but am unable to load it. Here is the stack trace of the exception:
{{}}
{code:java}
01-14-2022 03:49:52 PM ERROR Instrumentation: java.lang.NullPointerException at
java.base/java.lang.reflect.Method.invoke(Method.java:559) at
org.apache.spark.ml.util.DefaultParamsReader$.loadParamsInstanceReader(ReadWrite.scala:631)
at
org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$4(Pipeline.scala:276)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at
scala.collection.TraversableLike.map(TraversableLike.scala:238) at
scala.collection.TraversableLike.map$(TraversableLike.scala:231) at
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198) at
org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$3(Pipeline.scala:274)
at
org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
at scala.util.Try$.apply(Try.scala:213) at
org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:268) at
org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$7(Pipeline.scala:356)
at org.apache.spark.ml.MLEvents.withLoadInstanceEvent(events.scala:160) at
org.apache.spark.ml.MLEvents.withLoadInstanceEvent$(events.scala:155) at
org.apache.spark.ml.util.Instrumentation.withLoadInstanceEvent(Instrumentation.scala:42)
at
org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$6(Pipeline.scala:355)
at
org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
at scala.util.Try$.apply(Try.scala:213) at
org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
at
org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:355)
at
org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:349)
at org.apache.spark.ml.util.MLReadable.load(ReadWrite.scala:355) at
org.apache.spark.ml.util.MLReadable.load$(ReadWrite.scala:355) at
org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:337) at
com.dtech.scala.pipeline.PipelineProcess.process(PipelineProcess.scala:122) at
com.dtech.scala.pipeline.PipelineProcess$.main(PipelineProcess.scala:448) at
com.dtech.scala.pipeline.PipelineProcess.main(PipelineProcess.scala) at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566) at
org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65) at
org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala){code}
*Source Code*
[Unary
Transformer|https://gist.github.com/ally1221/ff10ec50f7ef98fb6cd365172195bfd5]
[Persist Unary Transformer & Pipeline
Model|https://gist.github.com/ally1221/42473cdc818a8cf795ac78d65d48ee14]
> Null Pointer Exception when Loading ML Pipeline Model with Custom Transformer
> -----------------------------------------------------------------------------
>
> Key: SPARK-37913
> URL: https://issues.apache.org/jira/browse/SPARK-37913
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.1.2
> Environment: Spark 3.1.2, Scala 2.12, Java 11
> Reporter: Alana Young
> Priority: Critical
> Labels: MLPipelineModels, MLPipelines
>
> I am trying to create and persist a ML pipeline model using a custom Spark
> transformer that I created based on the [Unary Transformer
> example|https://github.com/apache/spark/blob/v3.1.2/examples/src/main/scala/org/apache/spark/examples/ml/UnaryTransformerExample.scala]
> provided by Spark. I am able to save and load the transformer. When I
> include the custom transformer as a stage in a pipeline model, I can save the
> model, but am unable to load it. Here is the stack trace of the exception:
>
> {code:java}
> 01-14-2022 03:49:52 PM ERROR Instrumentation: java.lang.NullPointerException
> at java.base/java.lang.reflect.Method.invoke(Method.java:559) at
> org.apache.spark.ml.util.DefaultParamsReader$.loadParamsInstanceReader(ReadWrite.scala:631)
> at
> org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$4(Pipeline.scala:276)
> at
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at
> scala.collection.TraversableLike.map(TraversableLike.scala:238) at
> scala.collection.TraversableLike.map$(TraversableLike.scala:231) at
> scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198) at
> org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$3(Pipeline.scala:274)
> at
> org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
> at scala.util.Try$.apply(Try.scala:213) at
> org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
> at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:268) at
> org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$7(Pipeline.scala:356)
> at org.apache.spark.ml.MLEvents.withLoadInstanceEvent(events.scala:160) at
> org.apache.spark.ml.MLEvents.withLoadInstanceEvent$(events.scala:155) at
> org.apache.spark.ml.util.Instrumentation.withLoadInstanceEvent(Instrumentation.scala:42)
> at
> org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$6(Pipeline.scala:355)
> at
> org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
> at scala.util.Try$.apply(Try.scala:213) at
> org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
> at
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:355)
> at
> org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:349)
> at org.apache.spark.ml.util.MLReadable.load(ReadWrite.scala:355) at
> org.apache.spark.ml.util.MLReadable.load$(ReadWrite.scala:355) at
> org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:337) at
> com.dtech.scala.pipeline.PipelineProcess.process(PipelineProcess.scala:122)
> at com.dtech.scala.pipeline.PipelineProcess$.main(PipelineProcess.scala:448)
> at com.dtech.scala.pipeline.PipelineProcess.main(PipelineProcess.scala) at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method) at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566) at
> org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65) at
> org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala){code}
>
> *Source Code*
> [Unary
> Transformer|https://gist.github.com/ally1221/ff10ec50f7ef98fb6cd365172195bfd5]
> [Persist Unary Transformer & Pipeline
> Model|https://gist.github.com/ally1221/42473cdc818a8cf795ac78d65d48ee14]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]