Hi everyone,

I am encountering an annoying issue when running spark with external jar
dependency downloaded from maven. This is how we run it

spark-shell --repositories <our-own-maven-release-repo> --packages
<our-package:latest.release>

When we release a new version and we have some big change in the API,
things start to randomly break for some users. For example, in version 0.44
we had a class DateUtils (used by class Utils) that was dropped in version
0.45. Running when version 0.45 was released (spark shows it is correctly
downloading it from maven) and using the class Utils some users got

NoClassDefFoundError for class DateUtils

To me this looks like a caching problem. Probably some node (master or an
executor) ClassLoader is still pointing to v0.44 and when loading Utils it
tries to find DateUtils class which has disappeared in newer jar. Not sure
how this can happen, this is only an intution.

Does anyone have any idea on how to solve this? It is also very hard to
debug since I couldn't find a pattern to reproduce it. It happens on every
release that changes a class name but not for everyone running the job
(that's why caching looked like a good hint to me).

Thanks,
*Alessandro Liparoti*

Reply via email to