So, it turns out this was a Spark version mismatch between the Beam JobServer and the Spark platform. Thanks for the pointer!
I'm running both Beam and Spark on Docker; the Spark image [0] provided version 3.3.0 and Scala version 2.12, but I used apache/beam_spark_job_server:2.41.0 [1] which provides Spark 2.4.x libraries, Scala version 2.11. Instead, I needed to use the apache/beam_spark3_job_server:2.41.0 image [2], which provides Spark 3.3.x. [0] https://hub.docker.com/r/apache/spark/tags [1] https://hub.docker.com/r/apache/beam_spark_job_server/tags [2] https://hub.docker.com/r/apache/beam_spark3_job_server/tags On Thu, Aug 25, 2022 at 6:14 PM Sean Owen <sro...@gmail.com> wrote: > This suggests you have mixed two versions of Spark libraries. You probably > packaged Spark itself in your Spark app? > > On Thu, Aug 25, 2022 at 4:56 PM Elliot Metsger <emets...@gmail.com> wrote: > >> Elliot Metsger <emets...@gmail.com> >> 9:48 AM (7 hours ago) >> to dev >> Howdy folks, >> >> Relative newbie to Spark, and super new to Beam. (I've asked this >> question on Beam lists, but this seems like a Spark-related issue so I'm >> trying my query here, too). I'm attempting to get a simple Beam pipeline >> (using the Go SDK) running on Spark. There seems to be an incompatibility >> between Java components related to object serializations which prevents a >> simple "hello world" pipeline from executing successfully. I'm really >> looking for some direction on where to look, so if anyone has any pointers, >> it is appreciated! >> >> When I submit the job via the go sdk, it errors out on the Spark side >> with: >> [8:59 AM] 22/08/25 12:45:59 ERROR TransportRequestHandler: Error while >> invoking RpcHandler#receive() for one-way message. >> java.io.InvalidClassException: >> org.apache.spark.deploy.ApplicationDescription; local class incompatible: >> stream classdesc serialVersionUID = 6543101073799644159, local class >> serialVersionUID = 1574364215946805297 >> I’m using apache/beam_spark_job_server:2.41.0 and apache/spark:latest. >> (docker-compose[0], hello world wordcount example pipeline[1]). >> >> It appears that the org.apache.spark.deploy.ApplicationDescription object >> (or something in its graph) doesn't explicitly assign a serialVersionUID. >> >> This simple repo[2] should demonstrate the issue. Any pointers would be >> appreciated! >> >> [0]: >> https://github.com/emetsger/beam-test/blob/develop/docker-compose.yml >> [1]: >> https://github.com/emetsger/beam-test/blob/develop/debugging_wordcount.go >> [2]: https://github.com/emetsger/beam-test >> >