mosche commented on issue #21092:
URL: https://github.com/apache/beam/issues/21092#issuecomment-1262423620

   @nitinlkoin1984 I finally found some time to look deeper into this. Sorry 
for the hassle, finding the job-server in this state is a bit disappointing.
   
   > Also what is the working and tested spark job server version and it's 
compatible Spark version.
   
   Unfortunately this is a weakness of the existing test infrastructure, it 
uses Spark in local mode. In that setup such a classpath issue won't be 
discovered.
   
   Anyways, I've done some testing:
   
   - You can fairly easily build yourself a custom version of the 
`beam_spark3_job_server` image on the latest Beam master for Spark `3.2.2` (or 
later). This versions of Spark are using Scala 2.12.15 and don't suffer from 
the Scala bug causing this. Here's the detailed steps how to do it:
      1. Pull the source code of Beam from https://github.com/apache/beam and 
checkout tag `v.2.14.0`
      2. Update the Spark version to `3.2.2` in these two places:
        
https://github.com/apache/beam/blob/f37795e326a75310828518464189440b14863834/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L496
        
https://github.com/apache/beam/blob/f37795e326a75310828518464189440b14863834/runners/spark/3/build.gradle#L23
     3. In the project directory, run the gradle command to build the docker 
container:
        ```
        ./gradlew :runners:spark:3:job-server:container:docker
        ```
        This will build `apache/beam_spark3_job_server:latest`, it will be 
available for local use. 
   
   - Downgrading Scala in the job-server image to 2.12.10 is also possible, but 
not as obvious. The Scala version is bumped by a transitive dependency.  
      1. Append the following lines to  
[runners/spark/3/job-server/build.gradle](
       
https://github.com/apache/beam/blob/17453e71a81ba774ab451ad141fc8c21ea8770c9/runners/spark/3/job-server/build.gradle)
         ```
         configurations.runtimeClasspath {
           resolutionStrategy {
             force "org.scala-lang:scala-library:2.12.10"
           }
         }
         ```
      2. In the project directory, run the gradle command to build the docker 
container:
         ```
         ./gradlew gradle :runners:spark:3:job-server:clean
         ./gradlew :runners:spark:3:job-server:container:docker
         ```
         This will build `apache/beam_spark3_job_server:latest`, it will be 
available for local use. 
   
   Alternatively you could build yourself a custom Spark 3.1.2 image that 
contains Scala 2.12.15 (instead of 2.12.10) on the classpath. But I don't think 
that's generally a feasible option.
   
   Let me know if any of these options help!
   
   @aromanenko-dev The 2nd option would fix the job-server without having to 
bump Spark. One the other hand bumping to Spark 3.2.2 seems to be the more 
robust and longterm solution. But the biggest concern there is the Avro 
dependency upgrade (1.10). What do you think? Anyone else who could chime in?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to