[ 
https://issues.apache.org/jira/browse/SPARK-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660553#comment-14660553
 ] 

Ryan Williams commented on SPARK-1517:
--------------------------------------

h3. Maven snapshots

I hear your point that idiomatic Maven-snapshot workflows are not well suited 
to this task. Something I've been doing instead is running commands like this 
from within a Spark repo:

{code}
$ sha=$(git --no-pager log --no-walk --format="%h" HEAD)
$ mvn versions:set -DgenerateBackupPoms=false -DnewVersion=$sha
$ mvn install -DskipTests
{code}

This renames the version in all POMs to the abbreviated SHA of {{HEAD}}, builds 
Spark, and installs the SHA-namespaced artifacts in my local Maven cache, at 
e.g. {{~/.m2/repository/org/apache/spark/spark-core_2.10/901dbd0}}.

Then I just put {{901dbd0}} as the version in some other project and, voila, I 
can link against arbitrary Spark SHAs, have many co-exist in my local Maven 
cache without them all being named {{1.x.y-SNAPSHOT}}, etc. [Here's an 
example|https://github.com/hammerlab/pageant/blob/56bff88f426dd69083424a91cc35099a2a157f10/pom.xml#L30]
 where I needed a patched Spark before {{1.4.1}} was released with the fix I 
needed.

Could any existing continuous build infrastructure be modified to run the {{mvn 
versions:set}} command above and publish artifacts to some Maven repository, 
ID'd by SHA?

h3. Binaries
It also makes sense that your ASF user account will not scale for this purpose 
:) OTOH, it should be possible to store these cheaply somewhere. 
{{spark-1.4.1-bin-hadoop2.4.tgz}} is ~234MB and there are ~4000 SHAs from 1.2.0 
to 1.5.0, so hosting every single SHA in that range would be a few TB, afaict. 

Analogous to my previous question: could any existing continuous build 
infrastructure be modified to run the {{mvn versions:set}} command above and 
send upload binaries somewhere that could hold more than just the last few? 
These binaries are apparently already being generated, and mostly deleted in 
~24hrs as your ASF userdir runs out of space?

> Publish nightly snapshots of documentation, maven artifacts, and binary builds
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-1517
>                 URL: https://issues.apache.org/jira/browse/SPARK-1517
>             Project: Spark
>          Issue Type: Improvement
>          Components: Build, Project Infra
>            Reporter: Patrick Wendell
>            Assignee: Patrick Wendell
>            Priority: Critical
>
> Should be pretty easy to do with Jenkins. The only thing I can think of that 
> would be tricky is to set up credentials so that jenkins can publish this 
> stuff somewhere on apache infra.
> Ideally we don't want to have to put a private key on every jenkins box 
> (since they are otherwise pretty stateless). One idea is to encrypt these 
> credentials with a passphrase and post them somewhere publicly visible. Then 
> the jenkins build can download the credentials provided we set a passphrase 
> in an environment variable in jenkins. There may be simpler solutions as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to