Christian Kadner created BAHIR-38:
-------------------------------------
Summary: Spark-submit does not use latest locally installed Bahir
packages
Key: BAHIR-38
URL: https://issues.apache.org/jira/browse/BAHIR-38
Project: Bahir
Issue Type: Bug
Components: Build
Affects Versions: 2.0.0
Environment: Maven (3.3.9) on Mac OS X
Reporter: Christian Kadner
Assignee: Christian Kadner
We use {{`spark-submit --packages <maven-coordinates> ...`}} to run Spark with
any of the Bahir extensions.
In order to perform a _manual integration test_ of a Bahir code change
developers have to _build_ the respective Bahir module and then _install_ it
into their *local Maven repository*. Then, when running {{`spark-submit
--packages <maven-coordinates> ...`}} Spark will use *Ivy* to resolve the given
_maven-coordinates_ in order add the necessary jar files to the classpath.
The first time Ivy encounters new maven coordinates, it will download them from
the local or remote Maven repository. All consecutive times Ivy will just use
the previously cached jar files based on group ID, artifact ID and version, but
irrespective of creation time stamp.
This behavior is fine when using spark-submit with released versions of Spark
packages. For continuous development and integration-testing however that Ivy
caching behavior poses a problem.
To *work around* it developers have to *clear the local Ivy cache* each time
they _install_ a new version of a Bahir package into their local Maven
repository and before the run spark-submit.
For example, to test a code change in module streaming-mqtt, we would have to
do ...
{code}
mvn clean install -pl streaming-mqtt
rm -rf ~/.ivy2/cache/org.apache.bahir/spark-streaming-mqtt_2.11/
${SPARK_HOME}/bin/spark-submit \
--packages org.apache.bahir:spark-streaming-mqtt_2.11:2.0.0-SNAPSHOT \
test.py
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)