So, it turns out this was a Spark version mismatch between the Beam
JobServer and the Spark platform.  Thanks for the pointer!

I'm running both Beam and Spark on Docker; the Spark image [0] provided
version 3.3.0 and Scala version 2.12, but I used
apache/beam_spark_job_server:2.41.0 [1] which provides Spark 2.4.x
libraries, Scala version 2.11.  Instead, I needed to use the
apache/beam_spark3_job_server:2.41.0 image [2], which provides Spark 3.3.x.

[0] https://hub.docker.com/r/apache/spark/tags
[1] https://hub.docker.com/r/apache/beam_spark_job_server/tags
[2] https://hub.docker.com/r/apache/beam_spark3_job_server/tags

On Thu, Aug 25, 2022 at 6:14 PM Sean Owen <sro...@gmail.com> wrote:

> This suggests you have mixed two versions of Spark libraries. You probably
> packaged Spark itself in your Spark app?
>
> On Thu, Aug 25, 2022 at 4:56 PM Elliot Metsger <emets...@gmail.com> wrote:
>
>> Elliot Metsger <emets...@gmail.com>
>> 9:48 AM (7 hours ago)
>> to dev
>> Howdy folks,
>>
>> Relative newbie to Spark, and super new to Beam.  (I've asked this
>> question on Beam lists, but this seems like a Spark-related issue so I'm
>> trying my query here, too).  I'm attempting to get a simple Beam pipeline
>> (using the Go SDK) running on Spark. There seems to be an incompatibility
>> between Java components related to object serializations which prevents a
>> simple "hello world" pipeline from executing successfully.  I'm really
>> looking for some direction on where to look, so if anyone has any pointers,
>> it is appreciated!
>>
>> When I submit the job via the go sdk, it errors out on the Spark side
>> with:
>> [8:59 AM] 22/08/25 12:45:59 ERROR TransportRequestHandler: Error while
>> invoking RpcHandler#receive() for one-way message.
>> java.io.InvalidClassException:
>> org.apache.spark.deploy.ApplicationDescription; local class incompatible:
>> stream classdesc serialVersionUID = 6543101073799644159, local class
>> serialVersionUID = 1574364215946805297
>> I’m using apache/beam_spark_job_server:2.41.0 and apache/spark:latest.
>>  (docker-compose[0], hello world wordcount example pipeline[1]).
>>
>> It appears that the org.apache.spark.deploy.ApplicationDescription object
>> (or something in its graph) doesn't explicitly assign a serialVersionUID.
>>
>> This simple repo[2] should demonstrate the issue.  Any pointers would be
>> appreciated!
>>
>> [0]:
>> https://github.com/emetsger/beam-test/blob/develop/docker-compose.yml
>> [1]:
>> https://github.com/emetsger/beam-test/blob/develop/debugging_wordcount.go
>> [2]: https://github.com/emetsger/beam-test
>>
>

Reply via email to